Scalable Simulation Methodologies for Many-Core Heterogeneous Systems

With increasing complexity and performance demands of emerging compute-intensive data-parallel workloads, many-core computing systems are becoming a popular trend in computer design. Fast and scalable simulation methods are needed to make meaningful predictions of design alternatives to prepare early software development and to assess the performance of a system before the real hardware is available. However, simulation technology for large-scale, many-core systems is lagging behind. Most of the existing simulation techniques are slow and complex or have poor scalability and high cost of development, which leads to an unacceptable performance when simulating a large number of cores. New challenges brought by emerging many-core systems demand new methods for simulating these platforms. This thesis investigates the techniques for improving the performance of parallel simulators to model many-core heterogeneous architectures. With the recent advances in programmability and processing power of graphical hardware, General Purpose Graphical Processing Units (GPGPUs) are becoming a popular host platform for solving broad range of computationally expensive problems. The easy availability of low cost, highly parallel GPGPU platforms presents an opportunity for accelerating architectural simulations. In this thesis, I propose and investigate a novel idea of using GPGPUs as a host platform for accelerating simulation of many-core systems. GPGPUs are most suitable for certain specific highly parallel workloads with fine-grained multi-threading and high computational intensity. Architectural simulation, on the other hand, is an extremely complex task with numerous dependencies and rules. To this end, I first determine the feasibility of accelerating various target many-core architectures at different levels of abstraction details, considering the particularities of prevalent GPU architectures and their programming models. Based on this study, I target a heterogeneous architecture of a multi-core CPU connected to many-core coprocessors for GPGPU acceleration. I propose ways to exploit ample parallelism in many-core simulation and present a comprehensive framework to develop an architecture-independent, fast, scalable, easy-to-use, full-system simulator. In particular, I outline some of the challenges faced and insights gained using our proposed approach. I present specific methods and approaches that maximize the throughput of GPU power and memory bandwidth allowing optimized parallelization with minimal synchronization. Our results show high scalability and high speed up in performance for simulating up to eight thousand cores.


Related material