Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Student works
  4. Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers
 
master thesis

Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers

Marxen, Lea  
August 18, 2023

Since their beginnings in the 1830s and 1840s, news agencies have played an important role in the national and international news market, aiming to deliver news as fast and as reliable as possible. While we know that newspapers have been using agency content for a long time to produce their stories, the amount to which the agencies shape our news often remains unclear. Although researchers have already addressed this question, recently by using computational methods to assess the influence of news agencies at present, large-scale studies on the role of news agencies in the past continue to be rare. This thesis aims to bridge this gap by detecting news agencies in a large corpus of Swiss and Luxembourgish newspaper articles (the impresso corpus) for the years 1840-2000 using deep learning methods. For this, we first build and annotate a multilingual dataset with news agency mentions, which we then use to train and evaluate several BERT-based agency detection and classification models. Based on these experiments, we choose two models (for French and German) for the inference on the impresso corpus. Results show that ca. 10% of the articles explicitly reference news agencies, with the greatest share of agency content after 1940, although systematic citation of agencies already started slowly in the 1910s. Differences in the usage of agency content across time, countries and languages as well as between newspapers reveal a complex network of news flows, whose exploration provides many opportunities for future work.

  • Files
  • Details
  • Metrics
Type
master thesis
Author(s)
Marxen, Lea  
Advisors
Ehrmann, Maud  
•
Boros, Emanuela  orcid-logo
•
Duering, Marten
•
Kaplan, Frédéric  
Date Issued

2023-08-18

Total of pages

114p

Subjects

Machine Learning

•

Natural Language Processing

•

Information Extraction

•

Historical Documents

•

Digital Humanities

•

Media History

URL

Code repository

https://github.com/impresso/newsagency-classification
EPFL units
DHLAB  
RelationURL/DOI

IsSupplementedBy

https://doi.org/10.5281/zenodo.8333933
Available on Infoscience
September 11, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/200669
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés