Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Efficient Large Language Model Inference with Neural Block Linearization
 
conference paper

Efficient Large Language Model Inference with Neural Block Linearization

Erdogan, Mete
•
Tonin, Francesco  
•
Cevher, Volkan  orcid-logo
December 2025
39th Conference on Neural Information Processing Systems (NeurIPS 2025) [forthcoming publication]
39th Conference on Neural Information Processing Systems (NeurIPS 2025)

The high inference demands of transformer-based Large Language Models (LLMs) pose substantial challenges in their deployment. To this end, we introduce Neural Block Linearization (NBL), a novel framework for accelerating transformer model inference by replacing self-attention layers with linear approximations derived from Linear Minimum Mean Squared Error estimators. NBL leverages Canonical Correlation Analysis to compute a theoretical upper bound on the approximation error. Then, we use this bound as a criterion for substitution, selecting the LLM layers with the lowest linearization error. NBL can be efficiently applied to pretrained LLMs without the need for fine-tuning. In experiments, NBL achieves notable computational speed-ups while preserving competitive accuracy on multiple reasoning benchmarks. For instance, applying NBL to 12 self-attention layers in DeepSeek-R1-Distill-Llama-8B increases the inference speed by 32% with less than 1% accuracy trade-off, making it a flexible and promising solution to improve the inference efficiency of LLMs. The implementation is available at: https://github.com/LIONS-EPFL/NBL.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

7134_Efficient_Large_Language_.pdf

Type

Main Document

Version

Accepted version

Access type

openaccess

License Condition

N/A

Size

2.55 MB

Format

Adobe PDF

Checksum (MD5)

53e9424f813c7ce1aff341b670c19f1e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés