Stop Wasting your Cache! Bringing Machine Learning into Cache Computing

Petrolo, Vincenzo; Guella, Flavia; Caon, Michele; Masera, Guido; Martina, Maurizio

doi:10.1145/3706594.3726983

conference paper

Stop Wasting your Cache! Bringing Machine Learning into Cache Computing

Petrolo, Vincenzo

•

Guella, Flavia

•

Caon, Michele

July 7, 2025

Proceedings of the 22nd ACM International Conference on Computing Frontiers: Workshops and Special Sessions

CF '25 Companion: 22nd ACM International Conference on Computing Frontiers

The rapid evolution of Machine Learning (ML) workloads, particularly Deep Neural Networks (DNNs) and Transformer-based models, has intensified demands on computing architectures, highlighting the limitations of traditional von Neumann systems due to the memory bottleneck. To address these challenges, this paper investigates the mapping of fundamental Machine Learning (ML) operations onto ARCANE, a Near-Memory Computing (NMC)-based architecture that integrates Vector Processing Units (VPUs) directly within the data cache. ARCANE offers a flexible ISA-extension (xmnmc) abstracting memory management, effectively reducing data movement and enhancing performance. We specifically explore the acceleration capabilities of ARCANE when executing fundamental Deep Neural Network (DNN) and Transformer-based operations. Experimental results show that, with a contained area overhead, ARCANE achieves consistent speedups, delivering up to 150 × improvement in 2D convolution, 305 × in Linear layer, and over 32 × in Fused-Weight Self-Attention (FWSA), compared to conventional CPU approaches. These findings underline ARCANE’s significant benefits in supporting efficient deployment of edge-oriented Machine Learning (ML) workloads.

Type

conference paper

DOI

10.1145/3706594.3726983

Author(s)

Petrolo, Vincenzo

Polytechnic University of Turin

Guella, Flavia

Polytechnic University of Turin

Caon, Michele

EPFL

Masera, Guido

Polytechnic University of Turin

Martina, Maurizio

Polytechnic University of Turin

Date Issued

2025-07-07

Publisher

ACM

Publisher place

New York, NY, USA

Published in

Proceedings of the 22nd ACM International Conference on Computing Frontiers: Workshops and Special Sessions

ISBN of the book

979-8-4007-1393-4

Start page

86

End page

89

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

ESL

Event name	Event acronym	Event place	Event date
CF '25 Companion: 22nd ACM International Conference on Computing Frontiers	CF '25 Companion	Cagliari Italy	2025-05-28 - 2025-05-30

Available on Infoscience

July 10, 2025

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/252110