Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads

We present a work-stealing algorithm for runtime scheduling of data-parallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, which allows workers to decide on the work division in a lock-free, workload-driven manner and attempts to minimize the amount of communication between them. A significant effort is given to showing that the algorithm has the least possible amount of overhead. We provide an extensive experimental evaluation, comparing the advantages and shortcomings of different data-parallel schedulers in order to combine their strengths. We show specific workload distribution patterns appearing in practice for which different schedulers yield suboptimal speedup, explaining their drawbacks and demonstrating how the work-stealing tree scheduler overcomes them. We thus justify our design decisions experimentally, but also provide a theoretical background for our claims.


Editeur(s):
Cascaval, C
Montesinos, P
Publié dans:
Languages And Compilers For Parallel Computing, Lcpc 2013, 8664, 55-86
Présenté à:
26th International Workshop on Languages and Compilers for Parallel Computing (LCPC), San Jose, CA, SEP 25-27, 2013
Année
2014
Publisher:
Berlin, Springer-Verlag Berlin
ISSN:
0302-9743
ISBN:
978-3-319-09967-5
978-3-319-09966-8
Laboratoires:




 Notice créée le 2014-12-30, modifiée le 2018-09-13


Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)