An Evaluation Framework for Dynamic Thermal Management Strategies in 3D MultiProcessor System-on-Chip Co-Design
Dynamic thermal management (DTM) has been widely adopted to improve the energy efficiency, reliability, and performance of modern MultiProcessor SoCs (MPSoCs) on runtime. However, the evolving industry trends and heterogeneous architecture designs have introduced significant challenges in state-of-the-art DTM methods. Specifically, the emergence of heterogeneous design has led to increased localized and non-uniform hotspots, necessitating accurate and responsive DTM strategies. Additionally, the increased number of cores to be managed requires the DTM to optimize and coordinate all aspects of the system. To address these challenges, an accurate thermal modeling and an efficient DTM evaluation framework are needed that encompasses both precise thermal modeling in localized hotspots and fast architecture simulation. However, existing methodologies fail in both these areas, preventing the development and exploration of new DTM approaches. To tackle these existing challenges, we first introduce the latest version of 3D-ICE 3.1, with a novel non-uniform thermal modeling technique, to improve the accuracy of thermal analysis and reduce overhead. Then, in conjunction with an efficient and fast offline application profiling strategy utilizing the architecture simulator gem5-X, we propose a novel DTM evaluation framework. This framework enables us to explore novel DTM methods to optimize the energy efficiency, reliability, and performance of contemporary 3D MPSoCs. The experimental results demonstrate that 3D-ICE 3.1 achieves high accuracy, with only 0.3K mean temperature error, without incurring overall computation overhead by allowing customized discretization levels of thermal grids. Subsequently, we evaluate various DTM methods using the aforementioned DTM evaluation framework and propose a Multi-Agent Reinforcement Learning (MARL) control to address the demanding thermal challenges of 3D MPSoCs. The experimental results show that the proposed DTM method based on MARL can reduce power consumption by 13% while maintaining a similar performance level to the comparison methods.
2024_TPDS_DTM_framework_final.pdf
main document
openaccess
N/A
3.92 MB
Adobe PDF
0b4e7008fb2fd92b24af5e243225e80b