Alpine: Efficient In-Situ Data Exploration in the Presence of Updates

Anagnostou, Antonios; Olma, Matthaios; Ailamaki, Anastasia

doi:10.1145/3035918.3058743

conference paper

Alpine: Efficient In-Situ Data Exploration in the Presence of Updates

Anagnostou, Antonios

•

Olma, Matthaios

•

Ailamaki, Anastasia

May 9, 2017

Proceeding SIGMOD '17 Proceedings of the 2017 ACM International Conference on Management of Data

SIGMOD International Conference on Management of Data

The ever growing data collections create the need for brief explorations of the available data to extract relevant information before decision making becomes necessary. In this context of data exploration, current data analysis solutions struggle to quickly pinpoint useful information in data collections. One major reason is that loading data in a DBMS without knowing which part of it will actually be useful is a major bottleneck. To remove this bottleneck, state-of-the art approaches perform queries in situ, thus avoiding the loading overhead. In situ query engines, however, are index-oblivious, and lack sophisticated techniques to reduce the amount of data to be accessed. Furthermore, applications constantly generate fresh data and update the existing raw data files whereas state-of-the art in situ approaches support only append-like workloads. In this demonstration, we showcase the efficiency of adaptive indexing and partitioning techniques for analytical queries in the presence of updates. We demonstrate an online partitioning and indexing tuner for in situ querying which plugs to a query engine and offers support for fast queries over raw data files. We present Alpine, our prototype implementation, which combines the tuner with a query executor incorporating in situ query techniques to provide efficient raw data access. We will visually demonstrate how Alpine incrementally and adaptively builds auxiliary data structures and indexes over raw data files and how it adapts its behavior as a side-effect of updates in the raw data files.