Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Text Representation Learning for Low Cost Natural Language Understanding
 
doctoral thesis

Text Representation Learning for Low Cost Natural Language Understanding

Mai, Jan Frederik Jonas Florian  
2023

Natural language processing and other artificial intelligence fields have witnessed impressive progress over the past decade. Although some of this progress is due to algorithmic advances in deep learning, the majority has arguably been enabled by scaling up general learning methods, such as language modeling, to more data, larger models, and increased compute resources. All else being equal, this comes at a substantially higher cost, limiting access for research teams with limited resources and preventing further upscaling. Consequently, the investigation of lower-cost solutions is crucial for the future of the NLP field. The compute cost of achieving a performance level can be broken down into three factors: 1) the amount of compute needed to process a single example, 2) the amount of data required to train the model, and 3) the number of hyperparameter configurations needed to reach the desired performance. In this thesis, we aim to contribute to all three factors through scalable, general learning methods. To address factor 1), we investigate sentence embedding methods based on simple word embedding summation. These methods often provide a strong baseline and are fast to compute, but they are fundamentally limited by their inability to capture word order. We propose a word embedding aggregation method that is sensitive to word order. Regarding factor 2), we introduce Emb2Emb, a framework for learning conditional text generation tasks in the embedding space of a text autoencoder. Since the autoencoder can be pretrained on unlabelled data once, training the task-specific conditional text generation model requires significantly less labeled data downstream. In pursuit of reducing the amount of hyperparameter tuning (factor 3)), we propose an evaluation protocol for deep learning optimizers that takes the cost of hyperparameter tuning into account, leading to actionable insights that can decrease the amount of hyperparameter tuning required. Finally, we introduce HyperMixer, an MLP-based neural architecture that can be viewed as a low cost alternative to the popular Transformer architecture since it empirically lowers the cost in terms of all three factors.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9913
Author(s)
Mai, Jan Frederik Jonas Florian  
Advisors
Gatica-Perez, Daniel  
•
Henderson, James  
Jury

Prof. Volkan Cevher (président) ; Prof. Daniel Gatica-Perez, Dr James Henderson (directeurs) ; Prof. Boi Faltings, Prof. Titouan Parcollet, Prof. Roy Schwartz (rapporteurs)

Date Issued

2023

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2023-06-15

Thesis number

9913

Total of pages

200

Subjects

natural language understanding

•

representation learning

•

efficient deep learning

•

conditional text generation

•

hyperparameter tuning

•

transformers

EPFL units
LIDIAP  
Faculty
STI  
School
IEM  
Doctoral School
EDEE  
Available on Infoscience
June 19, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/198519
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés