A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
We release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our guidelines (Romanello & Najem-Meyer, 2022). The corpus contains about 300 annotated pages, 111,216 tokens and 7,334 entity mentions and was featured in the HIPE-2022 shared task. Although named entity recognition (NER) showed reassuring results, optical character recognition (OCR) mistakes and extensive use of abbreviation kept entity linking (EL) a challenging task. With such characteristics, this corpus offers an excellent way to assess the adaptability of information extraction systems to noisy, domain-specific multilingual and multiscript environments.
10.5334_johd.150.pdf
main document
openaccess
CC BY
541.93 KB
Adobe PDF
bd33d647459eee90bb3b50d43d7b60c0