English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling

Loaiciga, Sharid; Meyer, Thomas; Popescu-Belis, Andrei

Loaiciga, Sharid; Meyer, Thomas; Popescu-Belis, Andrei

2014

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper presents a method for verb phrase (VP) alignment in an English/French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320,000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Three MT systems are compared: (1) a baseline phrase-based SMT; (2) a tense-aware SMT system using the above predictions within a factored translation model; and (3) a system using oracle predictions from the aligned VPs. For several tenses, such as the French 'imparfait', the tense-aware SMT system improves significantly over the baseline and is closer to the oracle system.

Details

Title English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling

Author(s) Loaiciga, Sharid ; Meyer, Thomas ; Popescu-Belis, Andrei

Published in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Pages 674–681

Conference The Ninth Language Resources and Evaluation Conference, Reykjavik, Iceland

Date 2014

Keywords

Machine Translation; verb phrase alignment; verb tense

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL

Record creation date 2014-04-19

Files

Abstract

Details

PDF