Named Entity Processing for Historical Texts

Ehrmann, Maud; Romanello, Matteo; Clematide, Simon

Ehrmann, Maud; Romanello, Matteo; Clematide, Simon

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Recognition and identification of real-world entities is at the core of virtually any text mining application. As a matter of fact, referential units such as names of persons, locations and organizations underlie the semantics of texts and guide their interpretation. Around since the seminal Message Understanding Conference (MUC) evaluation cycle in the 1990s, named entity-related tasks have undergone major evolutions until now, from entity recognition and classification to entity disambiguation and linking. Recently, NE processing has been called upon to contribute to the domain of digital humanities, where massive digitization of historical documents is producing huge amounts of texts. De facto, NE processing tools are increasingly being used in the context of historical documents. Research activities in this domain target texts of different nature (e.g., publications by cultural institutions, state-related documents, genealogical data, historical newspapers) and different tasks (NE recognition and classification, entity linking, or both). Experiments involve different time periods (from 16th to 20th c.), focus on different domains, and use different typologies. This great variety demonstrates how many and varied the needs – and the challenges – are, but makes performance comparison difﬁcult, not to say impossible. The objective of this tutorial is to provide the participants with essential knowledge with respect to a) NE processing in general and in DH, and b) how to apply NE recognition approaches.