Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Sampling-Based AQP in Modern Analytical Engines
 
conference paper

Sampling-Based AQP in Modern Analytical Engines

Sanca, Viktor  
•
Ailamaki, Anastasia  
2022
Data Management on New Hardware
SIGMOD/PODS '22: International Conference on Management of Dataent on New Hardware

As the data volume grows, reducing the query execution times remains an elusive goal. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often considered a second-class citizen. Modern analytical engines optimized for high bandwidth media and multi-core architectures only exacerbate existing inefficiencies, resulting in prohibitive query-time online sampling and longer preprocessing times in offline AQP systems. We demonstrate that the sampling operators can be practical in modern scale-up analytical systems. First, we evaluate three common sampling methods, identify algorithmic bottlenecks, and propose hardware-conscious optimizations. Second, we reduce the performance penalties of the added processing and sample materialization through system-aware operator design and compare the sample creation time to the matching relational operators of an in-memory JIT-compiled engine. The cost of data reduction with materialization is up to 2.5x of the equivalent group-by in the case of stratified sampling and virtually free (∼1x) for reasonable sample sizes of other strategies. As query processing starts to dominate the execution time, the gap between online and offline AQP methods diminishes.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

3533737.3535095.pdf

Type

Postprint

Version

Accepted version

Access type

restricted

License Condition

copyright

Size

698.36 KB

Format

Adobe PDF

Checksum (MD5)

f3426e142a036ba08b8d887ac492394a

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés