Running Efficiently CNNs on the Edge Thanks to Hybrid SRAM-RRAM In-Memory Computing

Rios, Marco AntonioPonzina, FlavioAnsaloni, GiovanniLevisse, Alexandre Sébastien JulienAtienza Alonso, David2020-11-302020-11-302020-11-302021-02-0110.23919/DATE51398.2021.9474233https://infoscience.epfl.ch/handle/20.500.14299/173734WOS:000805289900352The increasing size of Convolutional Neural Networks (CNNs) and the high computational workload required for inference pose major challenges for their deployment on resource-constrained edge devices. In this paper, we address them by proposing a novel In-Memory Computing (IMC) architecture. Our IMC strategy allows us to efficiently perform arithmetic operations based on bitline computing, enabling a high degree of parallelism while reducing energy-costly data transfers. Moreover, it features a hybrid memory structure, where a portion of each subarray, dedicated to storing CNN weights, is implemented as high-density, zero-standby-power Resistive RAM. Finally, it exploits an innovative method for storing quantized weights based on their value, named Weight Data Mapping (WDM), which further increases efficiency. Compared to state-of-the-art IMC alternatives, our solution provides up to 93% improvements in energy efficiency and up to 6x less run-time when performing inference on Mobilenet and AlexNet neural networks.SRAMRRAMIn-Memory ComputingCNNEdge ComputingRunning Efficiently CNNs on the Edge Thanks to Hybrid SRAM-RRAM In-Memory Computingtext::conference output::conference proceedings::conference paper