Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical Newspapers

Ehrmann, Maud

Ehrmann, Maud

2019

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

For long held on library and archive shelving, historical newspapers are currently undergoing mass digitization and millions of facsimiles, along with their machine-readable content acquired via Optical Character Recognition, are becoming accessible via a variety of online portals. If this represents a major step forward in terms of preservation of and access to documents, much remains to be done in order to provide an extensive and sophisticated access to the content of these digital resources. We believe that the promise of newspaper digitization lies in their semantic indexation, closely tied with the development of co-designed interfaces that accommodate text analysis research tools and their usage by humanities scholars. How to go beyond keyword search? How to explore complex and vast amounts of data? Based on the on-going project ‘impresso - Media Monitoring of the Past’, in this talk I will present our interdisciplinary approach and share hands-on experience in going from facsimiles to enhanced search and visualization capacities supporting historical research.

Details

Title Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical Newspapers

Author(s) Ehrmann, Maud

Conference Digital Humanitites in the Nordic Countries, Copenhagen, Denmark, March 2019

Date 2019-03-06

Keywords

digital humanities; historical newspapers; information extraction; information retrieval; semantic indexing; natural language processing; digital library

Note Keynote talk.

Additional link Conference website

Laboratories DHLAB

Record Appears in Scientific production and competences > CDH - College of Humanities and social sciences > Digital Humanities Institute > DHLAB - Digital Humanities Laboratory
Presentations & Talks
Work produced at EPFL
Published

Record creation date 2019-10-13