Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Constrained bit allocation for neural networks
 
conference paper not in proceedings

Constrained bit allocation for neural networks

Boudouh, Souleyman  
•
Harma, Simla Burcu  
•
Mahmoud, Abdulrahman
Show more
May 21, 2025
Machine Learning for Computer Architecture and Systems 2025

The increasing complexity of deep neural networks (DNNs) necessitates effective model compression to reduce their computational and memory footprints for deployment on resource-constrained hardware. Layer-wise bit allocation is a prominent compression method shown to significantly reduce DNN footprints while preserving model accuracy. However, how best to incorporate hardware constraints within the allocation search remains a key question, as many tacitly assume constraints can be adequately handled via soft penalties or heuristics, often failing to guarantee feasibility or optimality. In this paper, we explore a reformulation of the bit allocation problem as an explicit constrained optimization problem, solved using interior-point methods within a NAS-based framework, notably requiring only minimal calibration data (as few as 128 samples). We corroborate this approach with experiments spanning transformer architectures (Llama, Gemma, Qwen; 500M-3B parameters), evaluating performance with MXFP formats. We show that this constrained formulation not only allows us to achieve significantly finer resolution in compression ratios compared to the discrete steps offered by uniform MXFP application (e.g., 4.25, 6.25, 8.25 bits), but also demonstrates that explicitly satisfying hardware budgets while optimizing for accuracy consistently outperforms uniform allocation methods, improving performance by up to several standard deviations in some cases, especially under strict resource limits. Our findings extend to the efficient deployment of large models in resource-constrained compute platforms, offering insights into best practices for applying bit allocation to maximize hardware resource efficiency without unduly compromising accuracy.

  • Files
  • Details
  • Metrics
Type
conference paper not in proceedings
Author(s)
Boudouh, Souleyman  

EPFL

Harma, Simla Burcu  

EPFL

Mahmoud, Abdulrahman
Falsafi, Babak  

EPFL

Date Issued

2025-05-21

Subjects

Neural networks

•

Compression

•

Numerical formats

•

Block floating point

•

Microexponents

•

MXFP

•

Interior point methods

•

Constrained optimization

•

Bit allocation

•

Mixed precision

Written at

EPFL

EPFL units
PARSA  
Event nameEvent acronymEvent placeEvent date
Machine Learning for Computer Architecture and Systems 2025

MLArchSys 2025

Tokyo, Japan

2025-06-21

Available on Infoscience
August 8, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/252852
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés