Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. NoDB: Efficient Query Execution on Raw Data Files
 
research article

NoDB: Efficient Query Execution on Raw Data Files

Alagiannis, Ioannis  
•
Borovica-Gajic, Renata  
•
Branco, Miguel
Show more
2015
Communications of the ACM

As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare and to load the data into the database before executing the desired queries. Many applications already avoid using database systems, e.g., scientific data analysis and social networks, due to the complexity and the increased data-to-query time, i.e., the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this data deluge will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze. We here present the design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system. In particular, we show how to make raw data files a firstclass citizen, fully integrated with the query engine. Through our design and lessons learned by implementing the NoDB philosophy over a modern DBMS, we discuss the fundamental limitations as well as the strong opportunities that such a research path brings. We identify performance bottlenecks specific for in situ processing, namely the repeated parsing and tokenizing overhead and the expensive data type conversion. To address these problems, we introduce an adaptive indexing mechanism that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure. We conclude that NoDB systems are feasible to design and implement over modern DBMS, bringing an unprecedented positive effect in usability and performance.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1145/2830508
Web of Science ID

WOS:000365237800031

Author(s)
Alagiannis, Ioannis  
Borovica-Gajic, Renata  
Branco, Miguel
Idreos, Stratos
Ailamaki, Anastasia  
Date Issued

2015

Publisher

Assoc Computing Machinery

Published in
Communications of the ACM
Volume

58

Issue

12

Start page

112

End page

121

Subjects

Adaptive data loading

•

In situ querying

•

Positional map

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DIAS  
Available on Infoscience
November 24, 2015
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/120758
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés