Files

Abstract

Timely insights lead to business growth and scientific breakthroughs but require analytical engines that cope with the ever-increasing data processing needs. Analytical engines relied on rapid CPU improvements, yet the end of Dennard scaling stopped the free lunch and resulted in a heterogeneous hardware landscape that challenges existing analytical engines. First, each device has its own specialized execution model and architecture, impeding interoperability. Second, the diversity in device microarchitectures requires a diverse range of optimizations. Third, while the multitude of devices provides additional acceleration opportunities, moving the data across devices is costly. Finally, with networking bandwidths similar to intra-server device connections, the server boundaries are blurred, providing optimization opportunities and requiring careful orchestration to avoid wasting resources. In this thesis, we aim for engines tailored for heterogeneous hardware: abstracting out hardware heterogeneity to enable efficient execution across the devices despite their diversity. To this end, we design and implement techniques that are i) scalable through accelerator-level parallelism, and ii) efficient through query execution customization to the underlying accelerators and data transfer paths. Regarding scalability, we propose a unifying execution model and a throughput-oriented system view to enable on-the-fly multi-device orchestration without requiring knowledge about hardware specifics. In addition, by decoupling data and control flow, this thesis enables late and direct data transfers within and across servers. Regarding efficiency, we provide an execution model that limits operator instances to specific devices, enabling operators to customize themselves to a single device without concern for multi-device effects. In addition, by providing interconnect-aware transfer methods, this thesis minimizes the cost of offloading operations across devices. This thesis redesigns analytical engines to exploit hardware heterogeneity. Instead of trading hardware efficiency for accelerator-level scalability, this thesis embraces heterogeneity. Our design enables scalable analytics across CPU-GPU hardware and achieves the analytical performance of optimally combining a CPU- and a GPU-optimized engine. As a result, users benefit from faster insights without requiring large clusters of machines. The proposed accelerator-centric design paves the way toward analytical engines that benefit from hardware improvements across the hardware spectrum -- instead of relying on single-processor advancements.

Details

PDF