Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. A Named Entity-Annotated Corpus of 19<sup>th</sup> Century Classical Commentaries
 
data paper

A Named Entity-Annotated Corpus of 19th Century Classical Commentaries

Romanello, Matteo  
•
Najem-Meyer, Sven  
2024
Journal of Open Humanities Data

We release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our guidelines (Romanello & Najem-Meyer, 2022). The corpus contains about 300 annotated pages, 111,216 tokens and 7,334 entity mentions and was featured in the HIPE-2022 shared task. Although named entity recognition (NER) showed reassuring results, optical character recognition (OCR) mistakes and extensive use of abbreviation kept entity linking (EL) a challenging task. With such characteristics, this corpus offers an excellent way to assess the adaptability of information extraction systems to noisy, domain-specific multilingual and multiscript environments.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

10.5334_johd.150.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

CC BY

Size

541.93 KB

Format

Adobe PDF

Checksum (MD5)

bd33d647459eee90bb3b50d43d7b60c0

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés