Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
 
conference paper

Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents

Boros, Emanuela  orcid-logo
•
Ehrmann, Maud  
Oliver, Gillian
•
Frings-Hessami, Viviane
Show more
2025
Sustainability and Empowerment in the Context of Digital Libraries - 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024, Proceedings
26th International Conference on Asia-Pacific Digital Libraries

This paper investigates the presence of OCR-sensitive neurons within the Transformer architecture and their influence on named entity recognition (NER) performance on historical documents. By analysing neuron activation patterns in response to clean and noisy text inputs, we identify and then neutralise OCR-sensitive neurons to improve model performance. Based on two open access large language models (Llama2 and Mistral), experiments demonstrate the existence of OCR-sensitive regions and show improvements in NER performance on historical newspapers and classical commentaries, highlighting the potential of targeted neuron modulation to improve models’ performance on noisy text.

  • Details
  • Metrics
Type
conference paper
DOI
10.1007/978-981-96-0865-2_5
Scopus ID

2-s2.0-85213042221

Author(s)
Boros, Emanuela  orcid-logo

École Polytechnique Fédérale de Lausanne

Ehrmann, Maud  

École Polytechnique Fédérale de Lausanne

Editors
Oliver, Gillian
•
Frings-Hessami, Viviane
•
Du, Jia Tina
•
Tezuka, Taro
Date Issued

2025

Publisher

Springer Science and Business Media Deutschland GmbH

Published in
Sustainability and Empowerment in the Context of Digital Libraries - 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024, Proceedings
Series title/Series vol.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 15493 LNCS

ISSN (of the series)

1611-3349

0302-9743

Start page

54

End page

66

Subjects

Historical document processing

•

Neural network model analysis

•

OCR noise

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DHLAB  
Event nameEvent acronymEvent placeEvent date
26th International Conference on Asia-Pacific Digital Libraries

Bandar Sunway, Malaysia

2024-12-04 - 2024-12-06

FunderFunding(s)Grant NumberGrant URL

Swiss National Science Foundation

Available on Infoscience
January 26, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/244634
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés