Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns

Popescu-Belis, Andrei; Meyer, Thomas; Liyanapathirana, Jeevanthi; Cartoni, Bruno; Zufferey, Sandrine

Popescu-Belis, Andrei; Meyer, Thomas; Liyanapathirana, Jeevanthi; Cartoni, Bruno; Zufferey, Sandrine

2012

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.

Details

Title Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns

Author(s) Popescu-Belis, Andrei ; Meyer, Thomas ; Liyanapathirana, Jeevanthi ; Cartoni, Bruno ; Zufferey, Sandrine

Published in Proceedings of the eighth international conference on Language Resources and Evaluation (LREC)

Date 2012

Keywords

annotation; discourse connectives; Parallel Corpora; Pronouns; Statistical Machine Translation

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL

Record creation date 2013-12-19

Files

Abstract

Details

PDF