Disambiguating Discourse Connectives for Statistical Machine Translation

Meyer, Thomas; Hajlaoui, Najeh; Popescu-Belis, Andrei

doi:10.1109/TASLP.2015.2422576

Meyer, Thomas; Hajlaoui, Najeh; Popescu-Belis, Andrei

2015

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper shows that the automatic labeling of discourse connectives with the relations they signal, prior to machine translation (MT), can be used by phrase-based statistical MT systems to improve their translations. This improvement is demonstrated here when translating from English to four target languages - French, German, Italian and Arabic - using several test sets from recent MT evaluation campaigns. Using automatically labeled data for training, tuning and testing MT systems is beneficial on condition that labels are sufficiently accurate, typically above 70%. To reach such an accuracy, a large array of features for discourse connective labeling (morpho-syntactic, semantic and discursive) are extracted using state-of-the-art tools and exploited in factored MT models. The translation of connectives is improved significantly, between 0.7% and 10% as measured with the dedicated ACT metric. The improvements depend mainly on the level of ambiguity of the connectives in the test sets.

Details

Title Disambiguating Discourse Connectives for Statistical Machine Translation

Author(s) Meyer, Thomas ; Hajlaoui, Najeh ; Popescu-Belis, Andrei

Published in IEEE/ACM Transactions on Audio, Speech and Language Processing

Volume 23

Issue 7

Pages 1184-1197

Date 2015

DOI https://doi.org/10.1109/TASLP.2015.2422576

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2015-06-19

Actions

Preview

Select file: