Parameter-efficient Methods for Training and Inference of Transformer-based Models

Banaei, Mohammadreza

doi:10.5075/epfl-thesis-9867

doctoral thesis

Parameter-efficient Methods for Training and Inference of Transformer-based Models

2025

Recent advancements in natural language processing, driven by the success of large language models, have led to the scaling up of models to unprecedented sizes. While these models have achieved remarkable performance across various tasks, their computational and memory demands pose significant challenges. In this thesis, we address these challenges by introducing parameter-efficient methods for the inference and training phases of language models. To address the parameter-efficiency aspect of the inference phase, we propose a factorization-based compression framework designed to minimize the memory footprint and computational load of deploying large language models. We challenge the common usage of Singular Value Decomposition (SVD) for weight factorization and propose an autoencoder framework with a custom loss that results in better zero-shot perplexity for the compressed model. This compression framework decomposes each model's weight into multiple low-rank weights, enabling efficient inference while having competitive performance to the original model. We also demonstrate the effectiveness of various parameter-sharing schemes as well as a non-uniform (sensitivity-based) factorization scheme, which further boost the performance of our compression framework. Regarding the parameter-efficient training phase, we present two novel parameter-efficient fine-tuning (PEFT) methods that significantly reduce the number of trainable parameters without sacrificing model performance. The generalization of these methods is assessed across various settings and tasks, demonstrating superior performance compared to baseline approaches. For our first PEFT method contribution, We start by studying the performance of multilingual language models for the cross-lingual reasoning task and demonstrate that incorporating a PEFT method can significantly boost the model's generalization across different tasks/languages. Our second PEFT method focuses on a novel low-rank adaptation method that, while having less than 1% of LoRA's trainable parameters (Hu et al., 2021), performs competitively or better than LoRA across various benchmarks and model scales.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-9867

Author(s)

Banaei, Mohammadreza

EPFL

Advisors

Aberer, Karl

Jury

Prof. Nicolas Henri Bernard Flammarion (président) ; Prof. Karl Aberer (directeur de thèse) ; Prof. Martin Jaggi, Dr James Henderson, Dr Damien Teney (rapporteurs)

Date Issued

2025

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2025-02-07

Thesis number

9867

Total of pages

132

Subjects

compression

•

parameter-efficient

•

transformer

•

factorization

•

large language models

•

LoRA

•

multilinguality

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

February 3, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/246394