Optimizing Goodput through Sharing for Batch Analytics with Deadlines
Modern big data systems process not only a huge volume of data, but also numerous concurrent queries. These queries usually span over distributed data and need to be processed within a strict deadline (e.g.: report generation, SLA, etc.). While users expect all queries to be completed within these tight deadlines, the presence of failures (causing delays) often leads to relying on best-effort solutions. Commonly, this involves maximizing the number of queries that can be completed within the batch before the deadline. In order to address this issue, existing systems typically aim to maximize system's throughput either by enchancing singlequery performance or by sharing work, while being oblivious to deadline. Previous studies on real-time systems have proposed methods centered around meeting deadlines, but miss opportunities resulting from overlap of work among queries in the batch. Thus, these limitations result in providing limited performance. In this paper, we present a novel system called BIGSHARED, which aims to maximize queries that complete by harnessing the advantages of reusing computations through work-sharing. We introduce a unique deadline-conscious batch optimizer that manages the delicate balance between meeting deadlines and leveraging sharing. To further improve performance, especially in the event of failures, we incorporate an efficient fault-tolerance mechanism through sharing-conscious checkpointing. We evaluate the performance of BIGSHARED using queries from the TPCDS benchmark on an open source big data system. The experimental results demonstrate that, on average, BIGSHARED surpasses the optimal query-at-a-time approach by 61%, and outperforms a state-of-the-art work-sharing system by 74%, thus showcasing significant benefits of amalgamating deadline-aware and sharingaware paradigms.
paper-86.pdf
main document
openaccess
CC BY
1.09 MB
Adobe PDF
0df797f90060ed69c72cd87574204f2a