Qadeer, WajahatHameed, RehanShacham, OferVenkatesan, PreethiKozyrakis, ChristosHorowitz, Mark2015-09-232015-09-232015-09-23201510.1145/2735841https://infoscience.epfl.ch/handle/20.500.14299/118578General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications within that domain. We present the Convolution Engine (CE)—a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer vision, and video processing. The CE achieves energy efficiency by capturing data-reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We demonstrate that the CE is within a factor of 2–3× of the energy and area efficiency of custom units optimized for a single kernel. The CE improves energy and area efficiency by 8–15× over data-parallel Single Instruction Multiple Data (SIMD) engines for most image processing applications.Specializationenergy efficiencyvideo processingacceleratorConvolution Engine: Balancing Efficiency and Flexibility in Specialized Computingtext::journal::journal article::research article