Text Representation Learning for Low Cost Natural Language Understanding

Mai, Jan Frederik Jonas Florian

doi:10.5075/epfl-thesis-9913

doctoral thesis

Text Representation Learning for Low Cost Natural Language Understanding

2023

Natural language processing and other artificial intelligence fields have witnessed impressive progress over the past decade. Although some of this progress is due to algorithmic advances in deep learning, the majority has arguably been enabled by scaling up general learning methods, such as language modeling, to more data, larger models, and increased compute resources. All else being equal, this comes at a substantially higher cost, limiting access for research teams with limited resources and preventing further upscaling. Consequently, the investigation of lower-cost solutions is crucial for the future of the NLP field. The compute cost of achieving a performance level can be broken down into three factors: 1) the amount of compute needed to process a single example, 2) the amount of data required to train the model, and 3) the number of hyperparameter configurations needed to reach the desired performance. In this thesis, we aim to contribute to all three factors through scalable, general learning methods. To address factor 1), we investigate sentence embedding methods based on simple word embedding summation. These methods often provide a strong baseline and are fast to compute, but they are fundamentally limited by their inability to capture word order. We propose a word embedding aggregation method that is sensitive to word order. Regarding factor 2), we introduce Emb2Emb, a framework for learning conditional text generation tasks in the embedding space of a text autoencoder. Since the autoencoder can be pretrained on unlabelled data once, training the task-specific conditional text generation model requires significantly less labeled data downstream. In pursuit of reducing the amount of hyperparameter tuning (factor 3)), we propose an evaluation protocol for deep learning optimizers that takes the cost of hyperparameter tuning into account, leading to actionable insights that can decrease the amount of hyperparameter tuning required. Finally, we introduce HyperMixer, an MLP-based neural architecture that can be viewed as a low cost alternative to the popular Transformer architecture since it empirically lowers the cost in terms of all three factors.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-9913

Author(s)

Mai, Jan Frederik Jonas Florian

Advisors

Gatica-Perez, Daniel

•

Henderson, James

Jury

Prof. Volkan Cevher (président) ; Prof. Daniel Gatica-Perez, Dr James Henderson (directeurs) ; Prof. Boi Faltings, Prof. Titouan Parcollet, Prof. Roy Schwartz (rapporteurs)

Date Issued

2023

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2023-06-15

Thesis number

9913

Total of pages

200

Subjects

natural language understanding

•

representation learning

•

efficient deep learning

•

conditional text generation

•

hyperparameter tuning

•

transformers

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

June 19, 2023

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/198519