Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Delgado, Pamela; Didona, Diego; Dinu, Florin; Zwaenepoel, Willy

doi:10.1145/2987550.2987563

conference paper

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

•

•

October 5, 2016

Proceedings of the Seventh ACM Symposium on Cloud Computing

ACM Symposium on Cloud Computing 2016 (SoCC'16)

We present Eagle, a new hybrid data center scheduler for data-parallel programs. Eagle dynamically divides the nodes of the data center in partitions for the execution of long and short jobs, thereby avoiding head-of-line blocking. Furthermore, it provides job awareness and avoids stragglers by a new technique, called Sticky Batch Probing (SBP). The dynamic partitioning of the data center nodes is accomplished by a technique called Succinct State Sharing (SSS), in which the distributed schedulers are informed of the locations where long jobs are executing. SSS is particularly easy to implement with a hybrid scheduler, in which the centralized scheduler places long jobs. With SBP, when a distributed scheduler places a probe for a job on a node, the probe stays there until all tasks of the job have been completed. When finishing the execution of a task corresponding to probe P, rather than executing a task corresponding to the next probe P' in its queue, the node may choose to execute another task corresponding to P. We use SBP in combination with a distributed approximation of Shortest Remaining Processing Time (SRPT) with starvation prevention. We have implemented Eagle as a Spark plugin, and we have measured job completion times for a subset of the Google trace on a 100-node cluster for a variety of cluster loads. We provide simulation results for larger clusters, different traces, and for comparison with other scheduling disciplines. We show that Eagle outperforms other state-of-the-art scheduling solutions at most percentiles, and is more robust against mis-estimation of task duration.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/129080

Name

job-aware-scheduling-fixed.pdf

Access type

openaccess

Size

589.82 KB

Format

Adobe PDF

Checksum (MD5)

7ec4c5b64ad7599898951615e80bd54c