A mean field model of work stealing in large-scale systems

In this paper, we consider a generic model of computational grids, seen as several clusters of homogeneous processors. In such systems, a key issue when designing ecient job allocation policies is to balance the workload over the dierent resources. We present a Markovian model for performance evaluation of such a policy, namely work stealing (idle processors steal work from others) in large-scale heterogeneous systems. Using mean eld theory, we show that when the size of the system grows, it converges to a system of deterministic ordinary dierential equations that allows one to compute the expectation of performance functions (such as average response times) as well as the distributions of these functions. We first study the case where all resources are homogeneous, showing in particular that work stealing is very efficient, even when the latency of steals is large. We also consider the case where distance plays a role: the system is made of several clusters, and stealing within one cluster is faster than stealing between clusters. We compare dierent work stealing policies, based on stealing probabilities and we show that the main factor for deciding where to steal from is the load rather than the stealing latency

Published in:
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '10, 13-24
Presented at:
the ACM SIGMETRICS international conference, New York, New York, USA, 14-18 06 2010
New York, New York, USA, ACM Press

 Record created 2011-10-14, last modified 2018-03-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)