Avant-Garde: Empowering GPUs with Scaled Numeric Formats
The escalating computational and memory demands of deep neural networks have outpaced chip density improvements, making arithmetic density a key bottleneck for GPUs. Scaled numeric formats, such as FP8 and Microscaling (MX), improve arithmetic density by applying adaptive scaling factors across varying block sizes and multiple scaling hierarchies. Unfortunately, supporting diverse scaled numeric formats often requires GPUs to rely on softwarebased implementations, increasing instruction and register overhead and degrading performance. We propose Avant-Garde, a GPU microarchitecture that natively supports diverse scaled numeric formats by converting them into a consistent single-level internal representation. Avant-Garde integrates an Operand Transformer, a hardware module that dynamically flattens multi-level scaling formats into single-level internal representations, a novel Tensor Core, and an optimized data layout to eliminate instruction and register overhead. Our evaluations show that Avant-Garde achieves up to 74% higher throughput and 44% lower execution time, while maintaining accuracy within 0.2% compared to conventional GPUs.
3695053.3731100.pdf
Main Document
http://purl.org/coar/version/c_970fb48d4fbd8a85
openaccess
CC BY
971.82 KB
Adobe PDF
714347dc0707492c735eae0397e20bb6