Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. A Hybrid Cache HW/SW Stack for Optimizing Neural Network Runtime, Power and Endurance
 
conference paper

A Hybrid Cache HW/SW Stack for Optimizing Neural Network Runtime, Power and Endurance

Simon, William Andrew  
•
Levisse, Alexandre Sébastien Julien  
•
Zapater Sancho, Marina  
Show more
2020
28th IFIP/IEEE International Conference on Very Large Scale Integration
28th IFIP/IEEE International Conference on Very Large Scale Integration

Hybrid caches consisting of both SRAM and emerging Non-Volatile Random Access Memory (eNVRAM) bitcells increase cache capacity and reduce power consumption by taking advantage of eNVRAM's small area footprint and low leakage energy. However, they also inherit eNVRAM's drawbacks, including long write latency and limited endurance. To mitigate these drawbacks, many works propose heuristic strategies to allocate memory blocks into SRAM or eNVRAM arrays at runtime based on block content or access pattern. In contrast, this work presents a HW/SW Stack for Hybrid Caches (SHyCache), consisting of a hybrid cache architecture and supporting programming model, reminiscent of those that enable GP-GPU acceleration, in which application variables can be allocated explicitly to the eNVRAM cache, eliminating the need for heuristics and reducing cache access time, power consumption, and area overhead while maintaining maximal cache utilization efficiency and ease of programming. SHyCache improves performance for applications such as neural networks, which contain large numbers of invariant weight values with high read/write access ratios that can be explicitly allocated to the eNVRAM array. We simulate SHyCache on the gem5-X architectural simulator and demonstrate its utility by benchmarking a range of cache hierarchy variations using three neural networks, namely, Inception v4, ResNet-50, and SqueezeNet 1.0. We demonstrate a design space that can be exploited to optimize performance, power consumption, or endurance, depending on the expected use case of the architecture, while demonstrating maximum performance gains of 1.7/1.4/1.3x and power consumption reductions of 5.1/5.2/5.4x, for Inception/ResNet/SqueezeNet, respectively.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1109/VLSI-SOC46417.2020.9344087
Author(s)
Simon, William Andrew  
Levisse, Alexandre Sébastien Julien  
Zapater Sancho, Marina  
Atienza Alonso, David  
Date Issued

2020

Published in
28th IFIP/IEEE International Conference on Very Large Scale Integration
Total of pages

6

Start page

94

End page

99

Subjects

eNVRAM

•

STT-MRAM

•

hybrid caches

•

neural networks

•

low-power systems

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ESL  
Event nameEvent placeEvent date
28th IFIP/IEEE International Conference on Very Large Scale Integration

Salt Lake City, Utah, USA

October 5-9, 2020

RelationURL/DOI

Cites

https://infoscience.epfl.ch/record/274287

Cites

https://infoscience.epfl.ch/record/129271
Available on Infoscience
August 21, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/171034
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés