Diachronic Evaluation of NER Systems on Old Newspapers

In recent years, many cultural institutions have engaged in large-scale newspaper digitization projects and large amounts of historical texts are being acquired (via transcription or OCRization). Beyond document preservation, the next step consists in providing an enhanced access to the content of these digital resources. In this regard, the processing of units which act as referential anchors, namely named entities (NE), is of particular importance. Yet, the application of standard NE tools to historical texts faces several challenges and performances are often not as good as on contemporary documents. This paper investigates the performances of different NE recognition tools applied on old newspapers by conducting a diachronic evaluation over 7 time-series taken from the archives of Swiss newspaper Le Temps.

Dipper, Stephanie
Neubarth, Friedrich
Zinsmeister, Heike
Publié dans:
Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 97-107
Présenté à:
13th Conference on Natural Language Processing (KONVENS 2016)Conference on Natural Language Processing, Bochum, GermanyBochum, Germany, September 19-21, 2016September 19–21, 2016
Bochum, Germany, Bochumer Linguistische Arbeitsberichte

 Notice créée le 2016-09-18, modifiée le 2019-08-12

Télécharger le documentPDF
Lien externe:
Télécharger le documentURL
Évaluer ce document:

Rate this document:
(Pas encore évalué)