Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Optimizing Goodput through Sharing for Batch Analytics with Deadlines
 
conference paper

Optimizing Goodput through Sharing for Batch Analytics with Deadlines

Venkatesh, Srinivas Karthik  
•
Sioulas, Panagiotis  
•
Pradhan, Ahana
Show more
March 25, 2024
International Conference on Extending Database Technology

Modern big data systems process not only a huge volume of data, but also numerous concurrent queries. These queries usually span over distributed data and need to be processed within a strict deadline (e.g.: report generation, SLA, etc.). While users expect all queries to be completed within these tight deadlines, the presence of failures (causing delays) often leads to relying on best-effort solutions. Commonly, this involves maximizing the number of queries that can be completed within the batch before the deadline. In order to address this issue, existing systems typically aim to maximize system's throughput either by enchancing singlequery performance or by sharing work, while being oblivious to deadline. Previous studies on real-time systems have proposed methods centered around meeting deadlines, but miss opportunities resulting from overlap of work among queries in the batch. Thus, these limitations result in providing limited performance. In this paper, we present a novel system called BIGSHARED, which aims to maximize queries that complete by harnessing the advantages of reusing computations through work-sharing. We introduce a unique deadline-conscious batch optimizer that manages the delicate balance between meeting deadlines and leveraging sharing. To further improve performance, especially in the event of failures, we incorporate an efficient fault-tolerance mechanism through sharing-conscious checkpointing. We evaluate the performance of BIGSHARED using queries from the TPCDS benchmark on an open source big data system. The experimental results demonstrate that, on average, BIGSHARED surpasses the optimal query-at-a-time approach by 61%, and outperforms a state-of-the-art work-sharing system by 74%, thus showcasing significant benefits of amalgamating deadline-aware and sharingaware paradigms.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

paper-86.pdf

Type

Main Document

Version

Accepted version

Access type

openaccess

License Condition

CC BY

Size

1.09 MB

Format

Adobe PDF

Checksum (MD5)

0df797f90060ed69c72cd87574204f2a

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés