Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Adaptive Query Processing on Raw Data Files
 
doctoral thesis

Adaptive Query Processing on Raw Data Files

Alagiannis, Ioannis  
2015

Nowadays, business and scientific applications accumulate data at an increasing pace. This growth of information has already started to outgrow the capabilities of database management systems (DBMS). In a typical DBMS usage scenario, the user should define a schema, load the data and tune the system for an expected workload before submitting any queries. Copying data into a database is a significant investment in terms of time and resources, and in many cases unnecessary or even no longer feasible in practice due to the explosive data growth. Additionally, the way DBMS store and organize data during data loading defines how data will be accessed for a given workload and thus, the maximum performance. Selecting the underlying data layout (row-store or column-store) is a critical first tuning decision which cannot change. Nevertheless, today query analysis is not static; it evolves as queries change. Hence, static design decisions can be suboptimal. In this thesis, we advocate in situ query processing as the principal way to manage data in a database. We reconsider the data loading phase and redesign traditional query processing architectures to work efficiently over raw data files to address the heavy initialization cost that comes with data loading. We present adaptive data loading as an alternative to traditional full a priori data loading. We explore the potential of in situ query processing in the context of current DBMS architectures. We identify performance bottlenecks specific for in situ processing and we introduce an adaptive indexing mechanism (positional map) that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure and techniques for collecting statistics over raw data files. Moreover, we design a flexible query engine that is not built around a single storage layout but it can exploit different storage layouts and data execution strategies in a single engine. It decides during query processing, which design fits the input queries and properly adapts the underlying data storage. By applying code generation techniques, we dynamically generate access operators tailored for specific classes of queries. This thesis revises the traditional paradigm of loading, tuning and then querying by using in situ query processing as the principal way to minimize data-to-query time. We show that raw data files should not be considered ``outside'' the DBMS and full data loading should not be a requirement to exploit database technology. On the contrary, proper techniques specifically tailored to overcome limitations that come with accessing raw data files can eliminate the data loading overhead making, therefore, raw data files a first-class citizen, fully integrated with the query engine. The proposed roadmap can provide guidance on how to convert any traditional DBMS into an efficient in situ query engine.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-6644
Author(s)
Alagiannis, Ioannis  
Advisors
Ailamaki, Anastasia  
Jury

Prof. Babak Falsafi (président) ; professeure Anastasia Ailamaki (directeur de thèse) ; Prof. James Larus, Prof. Glenn Paulley , Prof. Neoklis Polyzotis (rapporteurs)

Date Issued

2015

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2015-09-16

Thesis number

6644

Total of pages

183

Subjects

Adaptive loading

•

in situ querying

•

data files

•

positional map

•

adaptive storage

•

adaptive hybrids

•

dynamic operators

•

databases

•

data analytics

EPFL units
DIAS  
Faculty
IC  
School
IIF  
Doctoral School
EDIC  
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/117697
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés