Architecture Exploration and Optimization of Heterogeneous Many-Core Compute and Memory Architectures with Architectural Extensions
The expeditious proliferation of Internet connectivity and the growing adoption of digital products have transformed various spheres of our everyday lives. This increased digitization of society has led to the emergence of new applications, which are deployed all the way from High Performance Computing (HPC) systems and cloud servers to mobile devices. These new emerging applications like video analytics, autonomous driving, natural language processing, content recommendation, bioinformatics, and genome sequencing, have different performance/energy requirements, and are deployed on a variety of different platforms. To meet the performance and Quality-of-Service (QoS)constraints, cloud servers, and HPC platforms are comprised of many-core processors. Even the processors in today's mobile and edge devices are multi-core. To optimize energy efficiency and performance, a system-level simulator is required. This simulator must be capable of simultaneously executing multi-threaded applications in a full-system environment with a complete operating system (OS), on a heterogeneous many-core system.
In this thesis, I present gem5-X, a system-level simulation platform to optimize many-core heterogeneous compute and memory architectures. Gem5-X exploration and optimization methodology is also proposed to optimize both the compute and memory sub-system for multi-thread applications. Gem5-X extends gem5 with architectural extensions for the compute and memory sub-systems, including in-caching computing accelerator, clustered heterogeneous cores, in-memory computing engine, and 3D stacked High Bandwidth Memory v2 (HBM2). It also adds support enhancements to gem5, including, Virtual Machine (VM) support, profiling support in the simulator using gperf profiler, enhanced checkpointing, and file sharing between the simulated and the host system. Finally, all the extensions and enhancements are fully supported in Full System (FS)mode of gem5-X, enabling booting and running a full Linux stack on top of the simulator, allowing almost any application running in a Linux system to be executed using gem5-X.
To demonstrate the capabilities of gem5-X, various compute-dominated and memory-dominated applications are used as case studies. These state-of-the-art applications include video encoding, video analytics, VMs in the cloud, Binary Neural Networks (BNNs) and Recurrent Neural Networks (RNNs) as compute-intensive workloads. In addition to compute-dominated workloads, memory-dominated applications, including genome sequence alignment based on Next Generation Sequencing (NGS) techniques, and Convolutional Neural Networks (CNNs) are presented as case studies to demonstrate the memory sub-system extensions of gem5-X. Gem5-X exploration methodology is used to optimize architectures for these applications, achieving both performance and energy benefits. All the applications and workloads used in this thesis are case studies to showcase the capabilities of gem5-X. Gem5-X is generic and can be used to optimize and explore architectures for any multi-threaded (or single-threaded) application that can run on a Linux-based OS.
EPFL_TH7949.pdf
n/a
openaccess
Copyright
7.14 MB
Adobe PDF
546122fa4aed475f1678d011d89e4641