Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Overflow-free Compute Memories for Edge AI Acceleration
 
research article

Overflow-free Compute Memories for Edge AI Acceleration

Ponzina, Flavio  
•
Rios, Marco Antonio  
•
Levisse, Alexandre Sébastien Julien  
Show more
October 1, 2023
Acm Transactions On Embedded Computing Systems

Compute memories are memory arrays augmented with dedicated logic to support arithmetic. They support the efficient execution of data-centric computing patterns, such as those characterizing Artificial Intelligence (AI) algorithms. These architectures can provide computing capabilities as part of the memory array structures (In-Memory Computing, IMC) or at their immediate periphery (Near-Memory Computing, NMC). By bringing the processing elements inside (or very close to) storage, compute memories minimize the cost of data access. Moreover, highly parallel (and, hence, high-performance) computations are enabled by exploiting the regular structure of memory arrays. However, the regular layout of memory elements also constrains the data range of inputs and outputs, since the bitwidths of operands and results stored at each address cannot be freely varied. Addressing this challenge, we herein propose a HW/SW co-design methodology combining careful per-layer quantization and inter-layer scaling with lightweight hardware support for overflow-free computation of dot-vector operations. We demonstrate their use to implement the convolutional and fully connected layers of AI models. We embody our strategy in two implementations, based on IMC and NMC, respectively. Experimental results highlight that an area overhead of only 10.5% (for IMC) and 12.9% (for NMC) is required when interfacing with a 2KB subarray. Furthermore, inferences on benchmark CNNs show negligible accuracy degradation due to quantization for equivalent floating-point implementations.

  • Details
  • Metrics
Type
research article
DOI
10.1145/3609387
Web of Science ID

WOS:001074334300024

Author(s)
Ponzina, Flavio  
Rios, Marco Antonio  
Levisse, Alexandre Sébastien Julien  
Ansaloni, Giovanni  
Atienza, David  
Date Issued

2023-10-01

Publisher

Assoc Computing Machinery

Published in
Acm Transactions On Embedded Computing Systems
Volume

22

Issue

5

Subjects

Technology

•

In-Memory Computing

•

Near-Memory Computing

•

Edge Machine Learning

•

Quantization

•

Convolutional Neural Networks

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ESL  
FunderGrant Number

ECH2020 WiPLASH

863337

EC H2020 FVLLMONTI

101016776

ACCESS - AI Chip Center for Emerging Smart Systems - InnoHK funding, Hong Kong SAR

Available on Infoscience
February 14, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/203695
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés