Learning computationally efficient static word and sentence representations

Gupta, Prakhar

doi:10.5075/epfl-thesis-7959

doctoral thesis

Learning computationally efficient static word and sentence representations

2021

Most of the Natural Language Processing (NLP) algorithms involve use of distributed vector representations of linguistic units (primarily words and sentences) also known as embeddings in one way or another. These embeddings come in two flavours namely, static/non-contextual and contextual. In a static embedding, the vector representation of a word is independent of its context as opposed to a contextual embedding where the word representation incorporates additional information from its surrounding context.

Recently, advancements in deep learning when applied to contextual embeddings have seen them outperforming their static counterparts. However, this improvement in performance with respect to that of the static embeddings has come at the cost of lesser computational efficiency in terms of both computational resources as well as training and inference times, relative lack of interpretability, and higher costs to the environment. Consequently, static embedding models despite not being as expressive and powerful as contextual embedding models continue to be of relevance in Natural Language Processing Research.

In this thesis, we propose improvements to the current state-of-the-art static word embedding and sentence embedding models in three different settings. Firstly, we propose an improved algorithm to learn word and sentence embedding from raw text by proposing changes to the Word2Vec training objective formulation and adding n-grams to the training to incorporate local contextual information. Consequently, we end up obtaining improved unsupervised static word and sentence embeddings. Our second major contribution is learning cross-lingual static word and sentence representations from parallel bilingual data where two corpora are aligned sentence-wise. Our word and sentence embeddings thus obtained outperform other bag-of-words bilingual embeddings on cross-lingual sentence retrieval and monolingual word similarity tasks while staying competitive with them on cross-lingual word translation tasks. In our last major contribution, we aim towards harnessing the expressive power of the contextual embedding models by distilling static word embeddings from contextual embedding models to use improved word representations for computationally light tasks. This allows us to utilize the semantic information possessed by the contextual embedding models while maintaining computational efficiency for inference tasks at the same time.