Avant-Garde: Empowering GPUs with Scaled Numeric Formats

Gil, Minseong; Ha, Dongho; Harma, Simla Burcu; Yoon, Myung Kuk; Falsafi, Babak; Ro, Won Woo; Oh, Yunho

doi:10.1145/3695053.3731100

conference paper

Avant-Garde: Empowering GPUs with Scaled Numeric Formats

Gil, Minseong

•

Ha, Dongho

•

Harma, Simla Burcu

June 20, 2025

Proceedings of the 52nd Annual International Symposium on Computer Architecture

The 52nd Annual International Symposium on Computer Architecture

The escalating computational and memory demands of deep neural networks have outpaced chip density improvements, making arithmetic density a key bottleneck for GPUs. Scaled numeric formats, such as FP8 and Microscaling (MX), improve arithmetic density by applying adaptive scaling factors across varying block sizes and multiple scaling hierarchies. Unfortunately, supporting diverse scaled numeric formats often requires GPUs to rely on softwarebased implementations, increasing instruction and register overhead and degrading performance. We propose Avant-Garde, a GPU microarchitecture that natively supports diverse scaled numeric formats by converting them into a consistent single-level internal representation. Avant-Garde integrates an Operand Transformer, a hardware module that dynamically flattens multi-level scaling formats into single-level internal representations, a novel Tensor Core, and an optimized data layout to eliminate instruction and register overhead. Our evaluations show that Avant-Garde achieves up to 74% higher throughput and 44% lower execution time, while maintaining accuracy within 0.2% compared to conventional GPUs.

Type

conference paper

DOI

10.1145/3695053.3731100

Author(s)

Gil, Minseong

Korea University

Ha, Dongho

Harma, Simla Burcu

École Polytechnique Fédérale de Lausanne

Yoon, Myung Kuk

Ewha Womans University

Falsafi, Babak

École Polytechnique Fédérale de Lausanne

Ro, Won Woo

Yonsei University

Oh, Yunho

Korea University

Date Issued

2025-06-20

Publisher

ACM

Publisher place

New York, NY, USA

Published in

Proceedings of the 52nd Annual International Symposium on Computer Architecture

ISBN of the book

979-8-4007-1261-6

Start page

153

End page

165

Subjects

GPU

•

Deep Neural Network

•

Scaled Numeric Format

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

PARSA

Event name	Event acronym	Event place	Event date
The 52nd Annual International Symposium on Computer Architecture	ISCA '25	Tokyo, Japan	2025-06-21 - 2025-06-25

Funder	Funding(s)	Grant Number	Grant URL
National Research Foundation of Korea		NRF-2022R1C1C1011021, RS-2024-00357037, RS-2025-00553645
Microsoft Research PhD Fellowship
Swiss National Science Foundation		200021_212757

Available on Infoscience

June 25, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/251538