Improving Generalization of Pretrained Language Models

Karimi Mahabadi, Rabeeh

doi:10.5075/epfl-thesis-8664

Karimi Mahabadi, Rabeeh

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this dissertation, we propose multiple methods to improve transfer learning for pretrained language models (PLMs). Broadly, transfer learning is a powerful technique in natural language processing, where a language model is first pre-trained on a data-rich task before being fine-tuned on a downstream task. Our first contribution is to propose two learning strategies to train neural models, which are more robust to dataset biases and transfer better to out-of-domain datasets. We specify the biases in terms of bias-only models, which learn to leverage the dataset biases. The bias-only models' predictions are then used to adjust the loss of the base model to reduce its reliance on biases by down-weighting the biased examples and focusing training on the hard examples. Our second contribution is to propose an effective regularization method to reduce overfitting when fine-tuning PLMs on low-resource tasks. We leverage Variational Information Bottleneck to suppress irrelevant features, and show that our method effectively reduces overfitting, finds sentence representations that are more robust to biases, and substantially improves generalization to out-of-domain datasets. Our third contribution is to develop an effective and parameter-efficient way to fine-tune PLMs in a multi-task learning setup while allowing generalization to new domains. It allows sharing information across tasks to enable positive transfer to low-resource and related tasks, while avoiding negative task interference. It employs a compact hypernetwork shared across tasks and layers which learns to generate task and layer-specific adapter parameters. This allows sharing knowledge across tasks while task-specific adapters enable the model to adapt to each individual task. Our fourth contribution is to propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a PLM's weights, which are computed efficiently as a sum of Kronecker products between shared slow weights and fast rank-one layer-specific matrices. By only training 0.047% of a PLM's parameters, it performs on par with standard fine-tuning and outperforms it on low-resource settings. Our final contribution is to propose Perfect, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any handcrafting, which is highly effective given as few as 32 data points. This is in contrast to prior methods that require carefully engineered prompts and verbalizers to convert examples into a cloze-format that the PLM can score. Perfect makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn a new multi-token label embedding during fine-tuning which is not tied to the model vocabulary and allows avoiding complex auto-regressive decoding. Perfect enables nearly 100x faster training and inference and outperforms existing state-of-the-art few-shot learning methods.

Details

Title Improving Generalization of Pretrained Language Models

Author(s) Karimi Mahabadi, Rabeeh

Advisor(s)

Cevher, Volkan
Henderson, James

Pagination 152

Date 2023

Publisher Lausanne, EPFL

Keywords

transfer learning; generalization; fine-tuning; bias-reduction; multi-task learning; few-shot learning; low-resource setting; parameter-efficient fine-tuning; adapter; robustness.

Language English

DOI https://doi.org/10.5075/epfl-thesis-8664

Laboratories LIONS

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIONS - Laboratory for Information and Inference Systems
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2023-03-27

Files

Abstract

Details

PDF