Fast Queries Over Heterogeneous Data Through Engine Customization

Karpathiotakis, Manos; Alagiannis, Ioannis; Ailamaki, Anastasia

doi:10.14778/2994509.2994516

Karpathiotakis, Manos; Alagiannis, Ioannis; Ailamaki, Anastasia

2016

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of heterogeneous datasets to gain insights. The different data models and formats pose a significant challenge on performing analysis over a combination of diverse datasets. Serving all queries using a single, general-purpose query engine is slow. On the other hand, using a specialized engine for each heterogeneous dataset increases complexity: queries touching a combination of datasets require an integration layer over the different engines. This paper presents a system design that natively supports heterogeneous data formats and also minimizes query execution times. For multi-format support, the design uses an expressive query algebra which enables operations over various data models. For minimal execution times, it uses a code generation mechanism to mimic the system and storage most appropriate to answer a query fast. We validate our design by building Proteus, a query engine which natively supports queries over CSV, JSON, and relational binary data, and which specializes itself to each query, dataset, and workload via code generation. Proteus outperforms state-of-the-art open-source and commercial systems on both synthetic and real-world workloads without being tied to a single data model or format, all while exposing users to a single query interface.

Details

Title Fast Queries Over Heterogeneous Data Through Engine Customization

Author(s) Karpathiotakis, Manos ; Alagiannis, Ioannis ; Ailamaki, Anastasia

Published in Proceedings of the VLDB Endowment

Pagination 12

Volume 9

Issue 12

Pages 972-983

Conference 42nd International Conference on Very Large Databases, New Delhi, India, September 5-9, 2016

Date 2016

Publisher New York, Assoc Computing Machinery

Keywords

OLAP; Heterogeneous Data; Data Management; Query Processing; Code Generation; Query Compilation; JSON; CSV; Query Engine

DOI https://doi.org/10.14778/2994509.2994516

Other identifier(s) View record in Web of Science

Additional link URL

Laboratories DIAS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > DIAS - Data-Intensive Applications and Systems Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2016-08-08

Files

Abstract

Details

PDF