Files

Abstract

Machine learning algorithms such as Convolutional Neural Networks (CNNs) are characterized by high robustness towards quantization, supporting small-bitwidth fixed-point arithmetic at inference time with little to no degradation in accuracy. In turn, small-bitwidth arithmetic can avoid using area-and-energy-hungry combinational multipliers, employing instead iterative shift-add operations. Crucially, this approach paves the way for very efficient data-level-parallel computing architectures, which allow fine-grained control of the operand bitwidths at run time to realize heterogeneous quantization schemes. For the first time, we herein analyze a novel scaling opportunity offered by shift-add architectures, which emerges from the relation between the bitwidth of operands and their effective critical path timing at run time. Employing post-layout simulations, we show that significant operating frequency increases can be achieved (by as much as 4.13× in our target architecture) at run time, with respect to the nominal design-time frequency constraint. Critically, by exploiting the ensuing Dynamic Bitwidth-Frequency Scaling (DBFS), speedups of up to 73% are achieved in our experiments when executing quantized CNNs, with respect to an alternative solution based on a combinational multiplier-adder that occupies 2.35× more area and requires 51% more energy.

Details

PDF