Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

Rios, Marco; Ponzina, Flavio; Levisse, Alexandre Sébastien Julien; Ansaloni, Giovanni; Atienza Alonso, David

doi:10.1109/TETC.2023.3237914

research article

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

Rios, Marco

•

Ponzina, Flavio

•

Levisse, Alexandre Sébastien Julien

2023

IEEE Transactions on Emerging Topics in Computing

By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in memory arrays and the extreme energy efficiency of computing in-situ, hence avoiding data transfers. Previous works have shown that BC brings disruptive efficiency gains when targeting AI workloads, a key metric in the context of emerging edge AI scenarios. This manuscript builds on these findings by proposing an end-to-end framework that leverages BC-specific optimizations to enable high parallelism and aggressive compression of AI models. Our approach is supported by a novel hardware module performing real-time decoding, as well as new algorithms to enable BC-friendly model compression. Our hardware/software approach results in a 91% energy savings (for a 1% accuracy degradation constraint) regarding state-of-the-art BC computing approaches.

Name

IEEE_TETC_Bitline_Computing_as_CNN_accelerator_2023.pdf

Type

Postprint

Version

Accepted version

Access type

openaccess

License Condition

copyright

Size

12.53 MB

Format

Adobe PDF

Checksum (MD5)

1137232d3e4a02c88d3cb9fccc443c85