Layer-wise Quantization for Quantized Optimistic Dual Averaging

Duc Nguyen, Anh; Markov, Ilia; Wu, Frank Zhengqing; Ramezani-Kebrya, Ali; Antonakopoulos, Kimon; Alistarh, Dan; Cevher, Volkan

conference paper

Duc Nguyen, Anh

•

Markov, Ilia

•

Wu, Frank Zhengqing

July 2025

Proceedings of the 42 nd International Conference on Machine Learning

Forty-Second International Conference on Machine Learning

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a 150% speedup over the baselines in end-to-end training time for training Wasserstein GAN on 12+ GPUs.

Name

11316_Layer_wise_Quantization_.pdf

Type

Main Document

Version

Accepted version

Access type

openaccess

License Condition

N/A

Size

745.63 KB

Format

Adobe PDF

Checksum (MD5)

297c5116b347c8c92ba39dfd71db4909