JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Ehrmann, Maud; Jacquet, Guillaume; Steinberger, Ralf

doi:10.3233/SW-160228

Ehrmann, Maud; Jacquet, Guillaume; Steinberger, Ralf

2017

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Since 2004 the European Commission's Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyam'in/Biniamin/Беньямин/بنيامين Netanyahu/Netanjahu/N\'{e}tanyahou/Netahny/Нетаньяху/\نتنياهو). This entity name variant data, known as JRC-Names, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union's Open Data Portal.

Details

Title JRC-Names: Multilingual Entity Name variants and titles as Linked Data

Author(s) Ehrmann, Maud ; Jacquet, Guillaume ; Steinberger, Ralf

Published in Semantic Web

Volume 8

Issue 2

Pages 283-295

Date 2017-01-02

Keywords

multilingual semantic web; linguistic linked data; lemon; named entity; name variants; linguistic resource

DOI https://doi.org/10.3233/SW-160228

Other identifier(s) View record in Web of Science

Laboratories DHLAB

Record Appears in Scientific production and competences > CDH - College of Humanities and social sciences > Digital Humanities Institute > DHLAB - Digital Humanities Laboratory
Peer-reviewed publications
Work outside EPFL
Journal Articles
Published

Record creation date 2016-05-20

Actions

Preview

Select file: