Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Student works
  4. Historical newspaper semantic segmentation using visual and textual features
 
master thesis

Historical newspaper semantic segmentation using visual and textual features

Barman, Raphaël
June 21, 2019

Mass digitization and the opening of digital libraries gave access to a huge amount of historical newspapers. In order to bring structure into these documents, current techniques generally proceed in two distinct steps. First, they segment the digitized images into generic articles and then classify the text of the articles into finer-grained categories. Unfortunately, by losing the link between layout and text, these two steps are not able to account for the fact that newspaper content items have distinctive visual features. This project proposes two main novelties. Firstly, it introduces the idea of merging the segmentation and classification steps, resulting in a fine- grained semantic segmentation of newspapers images. Secondly, it proposes to use textual features under the form of embeddings maps at segmentation step. The semantic segmentation with four categories (feuilleton, weather forecast, obituary, and stock exchange table) is done using a fully convolutional neural network and reaches a mIoU of 79.3%. The introduction of embeddings maps improves the overall performances by 3% and the generalization across time and newspapers by 8% and 12%, respectively. This shows a strong potential to consider the semantic aspect in the segmentation of newspapers and to use textual features to improve generalization.

  • Files
  • Details
  • Metrics
Type
master thesis
Author(s)
Barman, Raphaël
Advisors
Ehrmann, Maud  
•
Ares Oliveira, Sofia  
•
Clematide, Simon
Date Issued

2019-06-21

Publisher

EPFL

Publisher place

Lausanne

Total of pages

74

Written at

EPFL

EPFL units
DHMA  
DHLAB  
Faculty
CDH  
Section
DH-S  
Available on Infoscience
October 14, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/161996
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés