Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Optimizing Goodput through Sharing for Batch Analytics with Deadlines
 
conference paper

Optimizing Goodput through Sharing for Batch Analytics with Deadlines

Venkatesh, Srinivas Karthik  
•
Sioulas, Panagiotis  
•
Pradhan, Ahana
Show more
March 25, 2024
International Conference on Extending Database Technology

Modern big data systems process not only a huge volume of data, but also numerous concurrent queries. These queries usually span over distributed data and need to be processed within a strict deadline (e.g.: report generation, SLA, etc.). While users expect all queries to be completed within these tight deadlines, the presence of failures (causing delays) often leads to relying on best-effort solutions. Commonly, this involves maximizing the number of queries that can be completed within the batch before the deadline. In order to address this issue, existing systems typically aim to maximize system's throughput either by enchancing singlequery performance or by sharing work, while being oblivious to deadline. Previous studies on real-time systems have proposed methods centered around meeting deadlines, but miss opportunities resulting from overlap of work among queries in the batch. Thus, these limitations result in providing limited performance. In this paper, we present a novel system called BIGSHARED, which aims to maximize queries that complete by harnessing the advantages of reusing computations through work-sharing. We introduce a unique deadline-conscious batch optimizer that manages the delicate balance between meeting deadlines and leveraging sharing. To further improve performance, especially in the event of failures, we incorporate an efficient fault-tolerance mechanism through sharing-conscious checkpointing. We evaluate the performance of BIGSHARED using queries from the TPCDS benchmark on an open source big data system. The experimental results demonstrate that, on average, BIGSHARED surpasses the optimal query-at-a-time approach by 61%, and outperforms a state-of-the-art work-sharing system by 74%, thus showcasing significant benefits of amalgamating deadline-aware and sharingaware paradigms.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.48786/edbt.2024.29
Author(s)
Venkatesh, Srinivas Karthik  
Sioulas, Panagiotis  

EPFL

Pradhan, Ahana
Subramanya, Raghunandan
Mytilinis, Ioannis  

École Polytechnique Fédérale de Lausanne

Ailamaki, Anastasia  

EPFL

Date Issued

2024-03-25

ISBN of the book

78-3-89318-094-3

ISSN (of the series)

2367-2005

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DIAS  
Event nameEvent acronymEvent placeEvent date
International Conference on Extending Database Technology

EDBT

PAESTUM, ITALY

2024-03-25 - 2024-03-28

Available on Infoscience
September 25, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/241372
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés