Parameter-efficient Methods for Training and Inference of Transformer-based Models
Recent advancements in natural language processing, driven by the success of large language models, have led to the scaling up of models to unprecedented sizes. While these models have achieved remarkable performance across various tasks, their computational and memory demands pose significant challenges. In this thesis, we address these challenges by introducing parameter-efficient methods for the inference and training phases of language models. To address the parameter-efficiency aspect of the inference phase, we propose a factorization-based compression framework designed to minimize the memory footprint and computational load of deploying large language models. We challenge the common usage of Singular Value Decomposition (SVD) for weight factorization and propose an autoencoder framework with a custom loss that results in better zero-shot perplexity for the compressed model. This compression framework decomposes each model's weight into multiple low-rank weights, enabling efficient inference while having competitive performance to the original model. We also demonstrate the effectiveness of various parameter-sharing schemes as well as a non-uniform (sensitivity-based) factorization scheme, which further boost the performance of our compression framework. Regarding the parameter-efficient training phase, we present two novel parameter-efficient fine-tuning (PEFT) methods that significantly reduce the number of trainable parameters without sacrificing model performance. The generalization of these methods is assessed across various settings and tasks, demonstrating superior performance compared to baseline approaches. For our first PEFT method contribution, We start by studying the performance of multilingual language models for the cross-lingual reasoning task and demonstrate that incorporating a PEFT method can significantly boost the model's generalization across different tasks/languages. Our second PEFT method focuses on a novel low-rank adaptation method that, while having less than 1% of LoRA's trainable parameters (Hu et al., 2021), performs competitively or better than LoRA across various benchmarks and model scales.
EPFL_TH9867.pdf
Main Document
Not Applicable (or Unknown)
openaccess
N/A
11.29 MB
Adobe PDF
0cb0ae9431be2148e6b753e623e396cb