Just-In-Time Data Virtualization: Lightweight Data Management with ViDa

Karpathiotakis, Manos; Alagiannis, Ioannis; Heinis, Thomas; Branco, Miguel; Ailamaki, Anastasia

Karpathiotakis, Manos; Alagiannis, Ioannis; Heinis, Thomas; Branco, Miguel; Ailamaki, Anastasia

2015

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

As the size of data and its heterogeneity increase, traditional database system architecture becomes an obstacle to data analysis. Integrating and ingesting (loading) data into databases is quickly becoming a bottleneck in face of massive data as well as increasingly heterogeneous data formats. Still, state-of-the-art approaches typically rely on copying and transforming data into one (or few) repositories. Queries, on the other hand, are often ad-hoc and supported by pre-cooked operators which are not adaptive enough to optimize access to data. As data formats and queries increasingly vary, there is a need to depart from the current status quo of static query processing primitives and build dynamic, fully adaptive architectures. We build ViDa, a system which reads data in its raw format and processes queries using adaptive, just-in-time operators. Our key insight is use of virtualization, i.e., abstracting data and manipulating it regardless of its original format, and dynamic generation of operators. ViDa's query engine is generated just-in-time; its caches and its query operators adapt to the current query and the workload, while also treating raw datasets as its native storage structures. Finally, ViDa features a language expressive enough to support heterogeneous data models, and to which existing languages can be translated. Users therefore have the power to choose the language best suited for an analysis.

Details

Title Just-In-Time Data Virtualization: Lightweight Data Management with ViDa

Author(s) Karpathiotakis, Manos ; Alagiannis, Ioannis ; Heinis, Thomas ; Branco, Miguel ; Ailamaki, Anastasia

Published in Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR)

Conference 7th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, January 4-7, 2015

Date 2015

Keywords

data virtualization; raw data querying; code generation; just-in-time databases; data analytics; query processing

Laboratories DIAS

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > DIAS - Data-Intensive Applications and Systems Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2014-12-05

Actions

Preview

Select file: