In recent decades, the rapid growth of cloud computing has transformed the technological landscape and profoundly impacted our daily lives, driven by the demand for secure, flexible, and cost-effective computing solutions. To meet these demands, computing servers have become increasingly complex, featuring sophisticated multi-layered abstraction architectures that encompass hardware components, cloud infrastructures, and software applications. This thesis aims to tackle these challenges by introducing multi-objective optimization methods specifically designed for many-core computing servers, offering comprehensive solutions that take into account hardware, server, and software components across the entire computing ecosystem.
More specifically, at the MultiProcessor System On Chip (MPSoC) level, this thesis first introduces 3D-ICE 3.1, a thermal simulator equipped with novel non-uniform modeling techniques to enhance the efficiency and accuracy of thermal modeling for emerging heterogeneous MPSoC. Building on the capabilities of 3D-ICE 3.1, an accelerated dynamic thermal management (DTM) evaluation framework is developed to enable a comprehensive assessment of DTM methods. Leveraging this DTM evaluation framework, a multi-agent reinforcement learning (MARL)-based thermal management scheme is proposed to fully utilize the potentials of heterogeneous MPSoCs to reduce power consumption while maintaining a performance level similar to that of the comparison methods.
This improved thermal modeling and management capability of the proposed DTM evaluation framework enables more sophisticated server-level optimization strategies, leading to the development of optimal control and machine learning (ML)-based techniques. These strategies effectively enhance server performance while complying with strict thermal and reliability constraints. Moreover, the framework explores dynamic task queue management and architectural innovations, like hybrid cache configurations, to further optimize server performance and energy efficiency. These advancements collectively contribute to significant performance improvements of computing servers.
Recognizing the significant impact of runtime application demands, i.e., workloads, on computing servers' management decisions, this thesis explores application-level optimization techniques starting with ML methods to predict application performance based solely on low-level hardware metrics, under a black-box assumption. These predictive models can effectively differentiate between performance variations caused by interference from collocated virtual machines (VMs) or users on the same physical server and those resulting from normal workload fluctuations. By providing accurate performance forecasts, these models enhance server resource management by better anticipating and accommodating different applications' performance requirements. Building on this foundation, a novel workload-aware frequency scaling governor is introduced to optimize the energy efficiency of cloud scenarios.
Overall, this thesis demonstrates the potential and benefits of multi-objective optimization for multi-core servers by integrating accurate modeling, detailed application profiling, and advanced control strategies. These efforts not only enhance the reliability, performance, and energy efficiency of computing servers but also contribute to environmental sustainability, advancing the field of green computing.
EPFL_TH10687.pdf
Main Document
openaccess
N/A
12.73 MB
Adobe PDF
393e3f8d48d3aa7e4faa7d9942aaf09c