Avant-Garde: Empowering GPUs with Scaled Numeric Formats
The escalating computational and memory demands of deep neural networks have outpaced chip density improvements, making arithmetic density a key bottleneck for GPUs. Scaled numeric formats, such as FP8 and Microscaling (MX), improve arithmetic density by applying adaptive scaling factors across varying block sizes and multiple scaling hierarchies. Unfortunately, supporting diverse scaled numeric formats often requires GPUs to rely on softwarebased implementations, increasing instruction and register overhead and degrading performance. We propose Avant-Garde, a GPU microarchitecture that natively supports diverse scaled numeric formats by converting them into a consistent single-level internal representation. Avant-Garde integrates an Operand Transformer, a hardware module that dynamically flattens multi-level scaling formats into single-level internal representations, a novel Tensor Core, and an optimized data layout to eliminate instruction and register overhead. Our evaluations show that Avant-Garde achieves up to 74% higher throughput and 44% lower execution time, while maintaining accuracy within 0.2% compared to conventional GPUs.
Korea University
École Polytechnique Fédérale de Lausanne
Ewha Womans University
École Polytechnique Fédérale de Lausanne
Yonsei University
Korea University
2025-06-20
New York, NY, USA
979-8-4007-1261-6
153
165
REVIEWED
EPFL
Event name | Event acronym | Event place | Event date |
ISCA '25 | Tokyo, Japan | 2025-06-21 - 2025-06-25 | |