BLADE: A BitLine Accelerator for Devices on the Edge

The increasing ubiquity of edge devices in the consumer market, along with their ever more computationally expensive workloads, necessitate corresponding increases in computing power to support such workloads. In-memory computing is attractive in edge devices as it reuses preexisting memory elements, thus limiting area overhead. Additionally, in-SRAM Computing (iSC) efficiently performs computations on spatially local data found in a variety of emerging edge device workloads. We therefore propose, implement, and benchmark BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an iSC architecture that can perform massive SIMD-like complex operations on hundreds to thousands of operands simultaneously. We implement BLADE in 28nm CMOS and demonstrate its functionality down to 0.6V, lower than any conventional state-of-the-art iSC architecture. We also benchmark BLADE in conjunction with a full Linux software stack in the gem5 architectural simulator, providing a robust demonstration of its performance gain in comparison to an equivalent embedded processor equipped with a NEON SIMD co-processor. We benchmark BLADE with three emerging edge device workloads, namely cryptography, high efficiency video coding, and convolutional neural networks, and demonstrate 4x, 6x, and 3x performance improvement, respectively, in comparison to a baseline CPU/NEON processor at an equivalent power budget.

Published in:
Proceedings of 29th Edition of the ACM Great Lakes Symposium on VLSI (GLSVLSI 2019)
Presented at:
29th Edition of the ACM Great Lakes Symposium on VLSI (GLSVLSI 2019), Tysons Corner, VA, USA, May 9-11, 2019
May 09 2019

 Record created 2019-03-12, last modified 2019-08-12

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)