Files

Abstract

Machine Translation (MT) has made considerable progress in the past two decades, particularly after the introduction of neural network models (NMT). During this time, the research community has mostly focused on modeling and evaluating MT systems at the sentence level. MT models learn to translate from large amounts of parallel sentences in different languages. The focus on sentences brings a practical simplification for the task that favors efficiency but has the disadvantage of missing relevant contextual information. Several studies showed that the negative impact of this simplification is significant. One key point is that the discourse dependencies among distant words are ignored, resulting in a lack of coherence and cohesion in the text. The main objective of this thesis is to improve MT by including discourse-level constraints. In particular, we focus on the translation of the entity mentions. We summarize our contributions in four points. First, we define the evaluation process to assess entity translations (i.e., nouns and pronouns) and propose an automatic metric to measure this phenomenon. Second, we perform a proof-of-concept and analyze how effective it is to include entity coreference resolution (CR) in translation. We conclude that CR significantly helps pronoun translation and boosts the whole translation quality according to human judgment. Third, we focus on the discourse connections at the sentence level. We propose enhancing the sequential model to infer long-term connections by incorporating a ‘self-attention’ mechanism. This mechanism gives direct and selective access to the context. Experiments in different language pairs show that our method outperforms various baselines, and the analysis confirms that the model emphasizes a broader context and captures syntactic-like structures. Fourth, we formulate the problem of document-level NMT and model inter-sentential connections among words with a hierarchical attention mechanism. Experiments on multiple data sets show significant improvement over two strong baselines and conclude that the source and target sides’ contexts are mutually complementary. This set of results confirms that discourse significantly enhances translation quality, verifying our main thesis objective. Our secondary objective is to improve the CR task by modeling the underlying connections among entities at the document-level. This task is particularly challenging for current neural network models because it requires understanding and reasoning. First, we propose a method to detect entity mentions from partially annotated data. We then proposed to model coreference with a graph of entities encoded in a pre-trained language model as an internal structure. The experiments show that these methods outperform various baselines. CR has the potential to help MT and other text generation tasks by maintaining coherence between the entity mentions.

Details

Actions

Preview