Word Sequence Modeling using Deep Learning: an End-to-end Approach and its Applications

For a long time, natural language processing (NLP) has relied on generative models with task specific and manually engineered features. Recently, there has been a resurgence of interest for neural networks in the machine learning community, obtaining state-of-the-art results in various fields such as computer vision, speech processing and natural language processing. The central idea behind these approaches is to learn features and models simultaneously, in an end-to-end manner, and making as few assumptions as possible. In NLP, word embeddings, mapping words in a dictionary on a continuous low-dimensional vector space, have proven to be very efficient for a large variety of tasks while requiring almost no a-priori linguistic assumptions. In this thesis, we investigate continuous representations of segments in a sentence for the purpose of solving NLP tasks that involve complex sentence-level relationships. Our sequence modelling approach is based on neural networks and takes advantage of word embeddings. A first approach models words in context in the form of continuous vector representations which are used to solve the task of interest. With the use of a compositional procedure, allowing arbitrarily-sized segments to be compressed onto continuous vectors, the model is able to consider long-range word dependencies as well. We first validate our approach on the task of bilingual word alignment, consisting in finding word correspondences between a sentence in two different languages. Source and target words in context are modeled using convolutional neural networks, obtaining representations that are later used to compute alignment scores. An aggregation operation enables unsupervised training for this task. We show that our model outperforms a standard generative model. The model above is extended to tackle phrase prediction tasks where phrases rather than single words are to be tagged. These tasks have been typically cast as classic word tagging problems using special tagging schemes to identify the segments boundaries. The proposed neural model focuses on learning fixed-size representations of arbitrarily-sized chunks of words that are used to solve the tagging task. A compositional operation is introduced in this work for the purpose of computing these representations. We demonstrate the viability of the proposed representations by evaluating the approach on the multiwork expression tagging task. The remainder of this thesis addresses the task of syntactic constituency parsing which, as opposed to the above tasks, aims at producing a structured output, in the form of a tree, of an input sentence. Syntactic parsing is cast as multiple phrase prediction problems that are solved recursively in a greedy manner. An extension using recursive compositional vector representations, allowing for lexical infor- mation to be propagated from early stages, is explored as well. This approach is evaluated on a standard corpus obtaining performance comparable to generative models with much shorter computation time. Finally, morphological tags are included as additional features, using a similar composition procedure, to improve the parsing performance for morphologically rich languages. State-of-the-art results were obtained for these task and languages.


  • Thesis submitted - Forthcoming publication

Related material