Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. Precision-scalable MAC architectures optimized for neural networks have recently gained interest thanks to their subword parallel or bit-serial capabilities. Yet, it has been hard to make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. In this work, run-time configurable MAC units from ISSCC 2017 and 2018 are implemented and compared objectively under diverse precision scenarios. All circuits are synthesized in a 28nm commercial CMOS process with precision ranging from 2 to 8 bits. This work analyzes the impact of scalability and compares the different MAC units in terms of energy, throughput and area, aiming to understand the optimal architectures to reduce computation costs in neural-network processing.

Published in:
2019 IEEE 1st International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Mar 18 2019
New York, IEEE

Note: The status of this file is: Anyone

 Record created 2019-03-13, last modified 2020-10-29

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)