Studying Linguistic Changes on 200 Years of Newspapers

Buntinx, Vincent; Bornet, Cyril; Kaplan, Frédéric

conference paper not in proceedings

Studying Linguistic Changes on 200 Years of Newspapers

Buntinx, Vincent

•

Bornet, Cyril

•

Kaplan, Frédéric

Digital Humanities 2016

Large databases of scanned newspapers open new avenues for studying linguistic evolution. By studying a two-billion-word corpus corresponding to 200 years of newspapers, we compare several methods in order to assess how fast language is changing. After critically evaluating an initial set of methods for assessing textual distance between subsets corresponding to consecutive years, we introduce the notion of a lexical kernel, the set of unique words that maintain themselves over long periods of time. Focusing on linguistic stability instead of linguistic change allows building more robust measures to assess long term phenomena such as word resilience. By systematically comparing the results obtained on two subsets of the corpus corresponding to two independent newspapers, we argue that the results obtained are independent of the specificity of the chosen corpus, and are likely to be the results of more general linguistic phenomena.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/126014

Name

abstract_dh2016_vbuntinx_cbornet_fkaplan.pdf

Access type

openaccess

Size

539.75 KB

Format

Adobe PDF

Checksum (MD5)

40be2edb7806530453e02c79a3d9ca83