A Flexible In-Memory Computing Architecture for Heterogeneously Quantized CNNs

Ponzina, Flavio; Rios, Marco Antonio; Ansaloni, Giovanni; Levisse, Alexandre Sébastien Julien; Atienza Alonso, David

doi:10.1109/ISVLSI51109.2021.00039

conference paper

A Flexible In-Memory Computing Architecture for Heterogeneously Quantized CNNs

•

•

July 7, 2021

2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

IEEE Computer Society Annual Symposium on VLSI

Inferences using Convolutional Neural Networks (CNNs) are resource and energy intensive. Therefore, their execution on highly constrained edge devices demands the careful co-optimization of algorithms and hardware. Addressing this challenge, in this paper we present a flexible In-Memory Computing (IMC) architecture and circuit, able to scale data representations to varying bitwidths at run-time, while ensuring high level of parallelism and requiring low area. Moreover, we introduce a novel optimization heuristic, which tailors the quantization level in each CNN layer according to workloads and robustness considerations. We investigate the performance, accuracy and energy requirements of our co-design approach on CNNs of varying sizes, obtaining up to 76.2% increases in efficiency and up to 75.6% reductions in run-time with respect to fixed-bitwidth alternatives, for negligible accuracy degradation.

Name

Quantization2020.pdf

Type

postprint

Access type

openaccess

License Condition

copyright

Size

3 MB

Format

Adobe PDF

Checksum (MD5)

6895427e6e6a6b65bcfc130f9038d961