Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs
 
research article

Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs

Klein, Joshua Alexander Harrison  
•
Boybat, Irem  
•
Ansaloni, Giovanni  
Show more
August 1, 2024
IEEE Transactions on Parallel and Distributed Systems

Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1109/TPDS.2024.3437657
Author(s)
Klein, Joshua Alexander Harrison  

IMEC

Boybat, Irem  
Ansaloni, Giovanni  

École Polytechnique Fédérale de Lausanne

Atienza, David  

École Polytechnique Fédérale de Lausanne

Marina Zapater Sancho

HES-SO University of Applied Sciences and Arts Western Switzerland

Date Issued

2024-08-01

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Published in
IEEE Transactions on Parallel and Distributed Systems
Start page

1

End page

15

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ESL  
FunderGrant Number

EC H2020 WiPLASH project

863337

EC H2020 FVLLMONTI project

101016776

ACCESS - AI Chip Center for Emerging Smart Systems

Available on Infoscience
August 7, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/240639
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés