Yu, PengboPonzina, FlavioLevisse, Alexandre Sébastien JulienBiswas DwaipayanAnsaloni GiovanniAtienza DavidCatthoor Francky2024-05-162024-05-162024-05-162024-05-07https://infoscience.epfl.ch/handle/20.500.14299/208024Machine learning algorithms such as Convolutional Neural Networks (CNNs) are characterized by high robustness towards quantization, supporting small-bitwidth fixed-point arithmetic at inference time with little to no degradation in accuracy. In turn, small-bitwidth arithmetic can avoid using area-and-energy-hungry combinational multipliers, employing instead iterative shift-add operations. Crucially, this approach paves the way for very efficient data-level-parallel computing architectures, which allow fine-grained control of the operand bitwidths at run time to realize heterogeneous quantization schemes. For the first time, we herein analyze a novel scaling opportunity offered by shift-add architectures, which emerges from the relation between the bitwidth of operands and their effective critical path timing at run time. Employing post-layout simulations, we show that significant operating frequency increases can be achieved (by as much as 4.13× in our target architecture) at run time, with respect to the nominal design-time frequency constraint. Critically, by exploiting the ensuing Dynamic Bitwidth-Frequency Scaling (DBFS), speedups of up to 73% are achieved in our experiments when executing quantized CNNs, with respect to an alternative solution based on a combinational multiplier-adder that occupies 2.35× more area and requires 51% more energy.Low power architectureEdge machine learningSoftware-defined SIMDDynamic Bitwidth-Frequency Scaling (DBFS)DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMDtext::conference output::conference proceedings::conference paper