Yes but.. Can ChatGPT Identify Entities in Historical Documents?

conference paper

Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Gonzalez-Gallardo, Carlos-Emiliano

•

Boros, Emanuela

•

Girdhar, Nancy

January 1, 2023

2023 Acm/Ieee Joint Conference On Digital Libraries, Jcdl

23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL)

Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing it in the named entity recognition and classification (NERC) task in primary sources (e.g., historical newspapers and classical commentaries) in a zero-shot manner and by comparing it with state-of-the-art LM-based systems. Our findings indicate several shortcomings in identifying entities in historical text that range from the consistency of entity annotation guidelines, entity complexity, and code-switching, to the specificity of prompting. Moreover, as expected, the inaccessibility of historical archives to the public (and thus on the Internet) also impacts its performance.

Type

conference paper

DOI

10.1109/JCDL57899.2023.00034

Web of Science ID

WOS:001098971300024

Author(s)

Gonzalez-Gallardo, Carlos-Emiliano

Boros, Emanuela

Girdhar, Nancy

Hamdi, Ahmed

Moreno, Jose G.

Doucet, Antoine

Corporate authors

ACM

Date Issued

2023-01-01

Publisher

Assoc Computing Machinery

Publisher place

New York

Published in

2023 Acm/Ieee Joint Conference On Digital Libraries, Jcdl

ISBN of the book

979-8-3503-9931-8

Start page

184

End page

189

Subjects

Technology

•

Named Entity Recognition And Classification

•

Large Language Models

•

Generative Pretrained Transformer

•

Historical Documents

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

Event name	Event place	Event date
23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL)	Santa Fe, NM	JUN 26-30, 2023

Funder	Grant Number
ANNA	2019-1R40226
TER-MITRAD	AAPR2020-2019-8510010
Pypa	AAPR2021-2021-12263410
Show more

Available on Infoscience

February 20, 2024

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/204349