Files

Abstract

The problem of energy optimization in multi-core systems (such as single-chip multiprocessors) where the individual energy demands of various processing elements are governed by instantaneous workload requirements is well defined in literature. The significance of the problem is underlined by the increasing prominence of multi-core systems that must operate under strict power/energy budget constraints, both in mobile applications and in cases where special cooling arrangements can be very expensive. A range of solutions have been proposed over the last few years, which are mostly based on static, off-line calculation of a limited set of operating points in the form of optimum voltage and frequency assignments, that are subsequently chosen according to actual demands. Still, to our best knowledge, none of these studies have demonstrated an on-line solution to complex, multi-variable energy optimization problem which allows dynamic adjustment of individual operating frequencies and supply voltages of multiple processing elements. This thesis presents the design and silicon implementation of an analog-based energy optimizer unit, which is capable of dynamically adjusting power supply and clock frequencies of multiple embedded cores, tailored to the instantaneous workload information (computational task) and fully adaptive to variations in process and temperature. Our approach borrows from the basic principles of analog computation to continuously optimize the system-wide energy dissipation of multiple processing elements, converging on the global minima of the constrained optimization problem which are represented as stable operating points of a simple feedback loop. It is already well known that stable, approximate solutions of multi-variable optimization problems (such as gradient descent) can be obtained by using very compact analog circuits, e.g. resistive networks. The analogy between the energy minimization problem under timing constraints in a general task graph and the power minimization problem under Kirchhoff's current law constraints in an equivalent resistive network is exploited. The implementation of the on-line analog optimizer is then discussed. The realization of the blocks composing the system architecture is described, and circuit design issues are studied thoroughly. The three-loop demonstrator circuit of the proposed analog optimizer architecture has been implemented using a 0.18μm standard digital CMOS process. The overall circuit area of the optimizer is (245μm × 650μm) excluding decoupling capacitors, while each loop circuit occupies only (180μm × 120μm). Operating at a nominal supply of 1.8 V, the circuit is capable of supporting the desired frequency range of 170 MHz - 290 MHz, as well as the voltage range of 1.2 V - 1.8 V. Estimated workload levels for each task (loop) are provided as 4-bit binary inputs, and the corresponding solution for minimum energy consumption is observed as assigned supply voltages and operating frequencies for each processing element, for a certain task duration. The measured worst-case settling time for supply voltages is less than 50μs. The average power consumption of the entire three-loop optimizer is 4mW. Measurements experimentally validate the concept of fully analog, current-based solution to implement on-line energy minimization in complex multi-core systems under varying workload conditions. Key functional blocks of the proposed circuit operate in weak inversion, resulting in very low power dissipation for the optimizer. The prototype successfully demonstrates that the proposed optimizer block is also capable of taking into account the on-chip variations of temperature as well as process parameters. As such, it can be used as a generic building block for on-line energy optimization in complex systems.

Details

Actions