Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Toward timely, predictable and cost-effective data analytics
 
doctoral thesis

Toward timely, predictable and cost-effective data analytics

Borovica-Gajic, Renata  
2016

Modern industrial, government, and academic organizations are collecting massive amounts of data at an unprecedented scale and pace. The ability to perform timely, predictable and cost-effective analytical processing of such large data sets in order to extract deep insights is now a key ingredient for success. Traditional database systems (DBMS) are, however, not the first choice for servicing these modern applications, despite 40 years of database research. This is due to the fact that modern applications exhibit different behavior from the one assumed by DBMS: a) timely data exploration as a new trend is characterized by ad-hoc queries and a short user interaction period, leaving little time for DBMS to do good performance tuning, b) accurate statistics representing relevant summary information about distributions of ever increasing data are frequently missing, resulting in suboptimal plan decisions and consequently poor and unpredictable query execution performance, and c) cloud service providers - a major winner in the data analytics game due to the low cost of (shared) storage - have shifted the control over data storage from DBMS to the cloud providers, making it harder for DBMS to optimize data access. This thesis demonstrates that database systems can still provide timely, predictable and cost-effective analytical processing, if they use an agile and adaptive approach. In particular, DBMS need to adapt at three levels (to workload, data and hardware characteristics) in order to stabilize and optimize performance and cost when faced with requirements posed by modern data analytics applications. Workload-driven data ingestion is introduced with NoDB as a means to enable efficient data exploration and reduce the data-to-insight time (i.e., the time to load the data and tune the system) by doing these steps lazily and incrementally as a side-effect of posed queries as opposed to mandatory first steps. Data-driven runtime access path decision making introduced with Smooth Scan alleviates suboptimal query execution, postponing the decision on access paths from query optimization, where statistics are heavily exploited, to query execution, where the system can obtain more details about data distributions. Smooth Scan uses access path morphing from one physical alternative to another to fit the observed data distributions, which removes the need for a priori access path decisions and substantially improves the predictability of DBMS. Hardware-driven query execution introduced with Skipper enables the usage of cold storage devices (CSD) as a cost-effective solution for storing the ever increasing customer data. Skipper uses an out-of-order CSD-driven query execution model based on multi-way joins coupled with efficient cache and I/O scheduling policies to hide the non-uniform access latencies of CSD. This thesis advocates runtime adaptivity as a key to dealing with raising uncertainty about workload characteristics that modern data analytics applications exhibit. Overall, the techniques introduced in this thesis through the three levels of adaptivity (workload, data and hardware-driven adaptivity) increase the usability of database systems and the user satisfaction in the case of big data exploration, making low-cost data analytics reality.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH7140.pdf

Access type

openaccess

Size

3.94 MB

Format

Adobe PDF

Checksum (MD5)

58f3eac7c97b72b0fb540d809bb32ca6

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés