Stop Wasting your Cache! Bringing Machine Learning into Cache Computing
The rapid evolution of Machine Learning (ML) workloads, particularly Deep Neural Networks (DNNs) and Transformer-based models, has intensified demands on computing architectures, highlighting the limitations of traditional von Neumann systems due to the memory bottleneck. To address these challenges, this paper investigates the mapping of fundamental Machine Learning (ML) operations onto ARCANE, a Near-Memory Computing (NMC)-based architecture that integrates Vector Processing Units (VPUs) directly within the data cache. ARCANE offers a flexible ISA-extension (xmnmc) abstracting memory management, effectively reducing data movement and enhancing performance. We specifically explore the acceleration capabilities of ARCANE when executing fundamental Deep Neural Network (DNN) and Transformer-based operations. Experimental results show that, with a contained area overhead, ARCANE achieves consistent speedups, delivering up to 150 × improvement in 2D convolution, 305 × in Linear layer, and over 32 × in Fused-Weight Self-Attention (FWSA), compared to conventional CPU approaches. These findings underline ARCANE’s significant benefits in supporting efficient deployment of edge-oriented Machine Learning (ML) workloads.
Polytechnic University of Turin
Polytechnic University of Turin
EPFL
Polytechnic University of Turin
Polytechnic University of Turin
2025-07-07
New York, NY, USA
979-8-4007-1393-4
86
89
REVIEWED
EPFL
Event name | Event acronym | Event place | Event date |
CF '25 Companion | Cagliari Italy | 2025-05-28 - 2025-05-30 | |