Fast Queries Over Heterogeneous Data Through Engine Customization

Ailamaki, Anastasia

doi:10.14778/2994509.2994516

conference paper

Fast Queries Over Heterogeneous Data Through Engine Customization

Karpathiotakis, Manos

•

Alagiannis, Ioannis

•

Ailamaki, Anastasia

2016

Proceedings of the VLDB Endowment

42nd International Conference on Very Large Databases

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of heterogeneous datasets to gain insights. The different data models and formats pose a significant challenge on performing analysis over a combination of diverse datasets. Serving all queries using a single, general-purpose query engine is slow. On the other hand, using a specialized engine for each heterogeneous dataset increases complexity: queries touching a combination of datasets require an integration layer over the different engines. This paper presents a system design that natively supports heterogeneous data formats and also minimizes query execution times. For multi-format support, the design uses an expressive query algebra which enables operations over various data models. For minimal execution times, it uses a code generation mechanism to mimic the system and storage most appropriate to answer a query fast. We validate our design by building Proteus, a query engine which natively supports queries over CSV, JSON, and relational binary data, and which specializes itself to each query, dataset, and workload via code generation. Proteus outperforms state-of-the-art open-source and commercial systems on both synthetic and real-world workloads without being tied to a single data model or format, all while exposing users to a single query interface.

Name

proteus-vldb16.pdf

Type

Publisher's Version

Version

http://purl.org/coar/version/c_970fb48d4fbd8a85

Access type

openaccess

Size

2.82 MB

Format

Adobe PDF

Checksum (MD5)

34e167e11cf712494e30aed952b2d04a