Boosting named entity recognition in domain-specific and low-resource settings

Najem-Meyer, Sven

Najem-Meyer, Sven

2022

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Recent researches in natural language processing have leveraged attention-based models to produce state-of-the-art results in a wide variety of tasks. Using transfer learning, generic models like BERT can be fine-tuned for domain-specific tasks using little annotated data. In the field of digital humanities and classics, bibliographical reference extraction counts among the domain-specific tasks where few annotated datasets have been made available. It therefore remains a highly challenging Named Entity Recognition (NER) problem which has not been addressed by the aforementioned approaches yet. In this study, we try to boost bibliographical reference extraction with various transfer learning strategies. We compare three transformers to a Conditional Random Fields (CRF) developed by Romanello, using both generic and domain-specific pre-training. Experiments show that transformers consistently improve on CRF baselines. However, domain-specific pre-training yields no significant benefits. We discuss and compare these results in light of comparable researches in domain-specific NER.

Details

Title Boosting named entity recognition in domain-specific and low-resource settings

Author(s) Najem-Meyer, Sven

Pagination 21

Date 2022-01-13

Keywords

nlp; citation mining; ner

Note This the final version of a doctoral semester project's report conducted between February and September 2021.

Laboratories DHLAB

Record Appears in Scientific production and competences > CDH - College of Humanities and social sciences > Digital Humanities Institute > DHLAB - Digital Humanities Laboratory
Work produced at EPFL
Technical Reports
Published

Record creation date 2022-01-13

Files

Abstract

Details

PDF