000217508 245__ $$aA Method for Record Linkage with Sparse Historical Data
000217508 520__ $$aMassive digitization of archival material, coupled with automatic document processing techniques and data visualisation tools offers great opportunities for reconstructing and exploring the past. Unprecedented wealth of historical data (e.g. names of persons, places, transaction records) can indeed be gathered through the transcription and annotation of digitized documents and thereby foster large-scale studies of past societies. Yet, the transformation of hand-written documents into well-represented, structured and connected data is not straightforward and requires several processing steps. In this regard, a key issue is entity record linkage, a process aiming at linking different mentions in texts which refer to the same entity. Also known as entity disambiguation, record linkage is essential in that it allows to identify genuine individuals, to aggregate multi-source information about single entities, and to reconstruct networks across documents and document series. In this paper we present an approach to automatically identify coreferential entity mentions of type Person in a data set derived from Venetian apprenticeship contracts from the early modern period (16th-18th c.). Taking advantage of a manually annotated sub-part of the document series, we compute distances between pairs of mentions, combining various similarity measures based on (sparse) context information and person attributes.
000217508 6531_ $$aGarzoni
000217508 6531_ $$aRecord Linkage
000217508 6531_ $$aEntity Disambiguation
000217508 6531_ $$aNatural Language Processing
000217508 700__ $$0248581$$g242482$$aColavizza, Giovanni
000217508 700__ $$0248954$$g256249$$aEhrmann, Maud
000217508 700__ $$0248846$$g147407$$aRochat, Yannick
000217508 7112_ $$dJuly 11-16, 2016$$cKrakow, Poland$$aDigital Humanities Conference 2016
000217508 8564_ $$zPreprint$$yPreprint$$uhttps://infoscience.epfl.ch/record/217508/files/short-paper-garzoniLinkage.pdf$$s851009
