000267670 001__ 267670
000267670 005__ 20190812204805.0
000267670 037__ $$aCONF
000267670 245__ $$aA Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories
000267670 260__ $$c2019-10-06
000267670 269__ $$a2019-10-06
000267670 300__ $$a6
000267670 336__ $$aConference Papers
000267670 520__ $$aThe need for running complex Machine Learning (ML) algorithms, such as Convolutional Neural Networks (CNNs), in edge devices, which are highly constrained in terms of computing power and energy, makes it important to execute such applications efficiently. The situation has led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the weights (and possibly the data being operated) using only one bit. Because approximately 90% of the operations executed by CNNs and BNNs are convolutions, a significant part of the memory transfers consists of fetching the convolutional kernels. Such kernels are usually small (e.g., 3×3 operands), and particularly in BNNs redundancy is expected. Therefore, equal kernels can be mapped to the same memory addresses, requiring significantly less memory to store them. In this context, this paper presents a custom Binary Dot Product Engine (BDPE) for BNNs that exploits the features of Resistive Random-Access Memories (RRAMs). This new engine allows accelerating the execution of the inference phase of BNNs. The novel BDPE locally stores the most used binary weights and performs binary convolution using computing capabilities enabled by the RRAMs. The system-level gem5 architectural simulator was used together with a C-based ML framework to evaluate the system’s performance and obtain power results. Results show that this novel BDPE improves performance by 11.3%, energy efficiency by 7.4% and reduces the number of memory accesses by 10.7% at a cost of less than 0.3% additional die area, when integrated with a 28nm Fully Depleted Silicon On Insulator ARMv8 in-order core, in comparison to a fully-optimized baseline of YoloV3 XNOR-Net running in a unmodified Central Processing Unit.
000267670 6531_ $$aMachine Learning, Edge Devices, Binary Neural Networks, RRAM-based Binary Dot Product Engine
000267670 700__ $$aVieira, Joao
000267670 700__ $$aGiacomin, Edouard
000267670 700__ $$0249987$$aQureshi, Yasir Mahmood$$g264584
000267670 700__ $$0250076$$aZapater Sancho, Marina$$g264565
000267670 700__ $$aTang, Xifan
000267670 700__ $$aKvatinsky, Shahar
000267670 700__ $$0240268$$aAtienza Alonso, David$$g169199
000267670 700__ $$aGaillardon, Pierre-Emmanuel
000267670 7112_ $$a27th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)$$cCuzco, Peru$$dOctober 6-9, 2019
000267670 8560_ $$fyasir.qureshi@epfl.ch
000267670 8564_ $$uhttps://infoscience.epfl.ch/record/267670/files/2019_VLSI_SoC_Joao.pdf$$zPREPRINT$$s309126
000267670 909C0 $$mhomeira.salimi@epfl.ch$$mdavid.atienza@epfl.ch$$0252050$$zMarselli, Béatrice$$xU11977$$pESL
000267670 909CO $$pconf$$pSTI$$ooai:infoscience.epfl.ch:267670
000267670 960__ $$ayasir.qureshi@epfl.ch
000267670 961__ $$afantin.reichler@epfl.ch
000267670 973__ $$aEPFL$$rREVIEWED
000267670 980__ $$aCONF
000267670 981__ $$aoverwrite