000208856 001__ 208856
000208856 005__ 20190317000212.0
000208856 020__ $$a978-1-931971-225
000208856 037__ $$aCONF
000208856 245__ $$aHawk: Hybrid Datacenter Scheduling
000208856 260__ $$bUSENIX Association$$c2015-07-08
000208856 269__ $$a2015-07-08
000208856 300__ $$a12
000208856 336__ $$aConference Papers
000208856 520__ $$aThis paper addresses the problem of efficient scheduling of large clusters under high load and heterogeneous workloads. A heterogeneous workload typically consists of many short jobs and a small number of large jobs that consume the bulk of the cluster's resources. Recent work advocates distributed scheduling to overcome the limitations of centralized schedulers for large clusters with many competing jobs. Such distributed schedulers are inherently scalable, but may make poor scheduling decisions because of limited visibility into the overall resource usage in the cluster. In particular, we demonstrate that under high load, short jobs can fare poorly with such a distributed scheduler. We propose instead a new hybrid centralized/distributed scheduler, called Hawk. In Hawk, long jobs are scheduled using a centralized scheduler, while short ones are scheduled in a fully distributed way. Moreover, a small portion of the cluster is reserved for the use of short jobs. In order to compensate for the occasional poor decisions made by the distributed scheduler, we propose a novel and efficient randomized work-stealing algorithm. We evaluate Hawk using a trace-driven simulation and a prototype implementation in Spark. In particular, using a Google trace, we show that under high load, compared to the purely distributed Sparrow scheduler, Hawk improves the 50th and 90th percentile runtimes by 80% and 90% for short jobs and by 35% and 10% for long jobs, respectively. Measurements of a prototype implementation using Spark on a 100-node cluster confirm the results of the simulation.
000208856 6531_ $$adata center scheduling
000208856 700__ $$aDelgado, Pamela
000208856 700__ $$0247561$$aDinu, Florin$$g173698
000208856 700__ $$0249026$$aKermarrec, Anne-Marie$$g184811
000208856 700__ $$0243160$$aZwaenepoel, Willy$$g155705
000208856 7112_ $$aThe 2015 USENIX Annual Technical Conference$$cSanta Clara, CA, USA$$dJuly 8-10, 2015
000208856 7112_ $$a2015 USENIX Annual Technical Conference (USENIX ATC '15)$$cSanta Clara, CA, USA$$dJuly 8-10 2015
000208856 773__ $$q499-510$$tProceedings of the 2015 USENIX Annual Technical Conference
000208856 8560_ $$fpamela.delgado@epfl.ch
000208856 8564_ $$s2364248$$uhttps://infoscience.epfl.ch/record/208856/files/atc15-paper-delgado_update.pdf$$yPublisher's version$$zPublisher's version
000208856 909C0 $$0252226$$pLABOS$$xU10700
000208856 909CO $$ooai:infoscience.tind.io:208856$$pconf$$pIC$$qGLOBAL_SET
000208856 917Z8 $$x173698
000208856 917Z8 $$x173698
000208856 917Z8 $$x173698
000208856 917Z8 $$x199060
000208856 917Z8 $$x199060
000208856 937__ $$aEPFL-CONF-208856
000208856 973__ $$aEPFL$$rREVIEWED
000208856 980__ $$aCONF