Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Parameter-efficient Methods for Training and Inference of Transformer-based Models
 
doctoral thesis

Parameter-efficient Methods for Training and Inference of Transformer-based Models

Banaei, Mohammadreza  
2025

Recent advancements in natural language processing, driven by the success of large language models, have led to the scaling up of models to unprecedented sizes. While these models have achieved remarkable performance across various tasks, their computational and memory demands pose significant challenges. In this thesis, we address these challenges by introducing parameter-efficient methods for the inference and training phases of language models. To address the parameter-efficiency aspect of the inference phase, we propose a factorization-based compression framework designed to minimize the memory footprint and computational load of deploying large language models. We challenge the common usage of Singular Value Decomposition (SVD) for weight factorization and propose an autoencoder framework with a custom loss that results in better zero-shot perplexity for the compressed model. This compression framework decomposes each model's weight into multiple low-rank weights, enabling efficient inference while having competitive performance to the original model. We also demonstrate the effectiveness of various parameter-sharing schemes as well as a non-uniform (sensitivity-based) factorization scheme, which further boost the performance of our compression framework. Regarding the parameter-efficient training phase, we present two novel parameter-efficient fine-tuning (PEFT) methods that significantly reduce the number of trainable parameters without sacrificing model performance. The generalization of these methods is assessed across various settings and tasks, demonstrating superior performance compared to baseline approaches. For our first PEFT method contribution, We start by studying the performance of multilingual language models for the cross-lingual reasoning task and demonstrate that incorporating a PEFT method can significantly boost the model's generalization across different tasks/languages. Our second PEFT method focuses on a novel low-rank adaptation method that, while having less than 1% of LoRA's trainable parameters (Hu et al., 2021), performs competitively or better than LoRA across various benchmarks and model scales.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9867
Author(s)
Banaei, Mohammadreza  

EPFL

Advisors
Aberer, Karl  
Jury

Prof. Nicolas Henri Bernard Flammarion (président) ; Prof. Karl Aberer (directeur de thèse) ; Prof. Martin Jaggi, Dr James Henderson, Dr Damien Teney (rapporteurs)

Date Issued

2025

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2025-02-07

Thesis number

9867

Total of pages

132

Subjects

compression

•

parameter-efficient

•

transformer

•

factorization

•

large language models

•

LoRA

•

multilinguality

EPFL units
LSIR  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
February 3, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/246394
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés