Conference paper

A mean field model of work stealing in large-scale systems

In this paper, we consider a generic model of computational grids, seen as several clusters of homogeneous processors. In such systems, a key issue when designing ecient job allocation policies is to balance the workload over the dierent resources. We present a Markovian model for performance evaluation of such a policy, namely work stealing (idle processors steal work from others) in large-scale heterogeneous systems. Using mean eld theory, we show that when the size of the system grows, it converges to a system of deterministic ordinary dierential equations that allows one to compute the expectation of performance functions (such as average response times) as well as the distributions of these functions. We first study the case where all resources are homogeneous, showing in particular that work stealing is very efficient, even when the latency of steals is large. We also consider the case where distance plays a role: the system is made of several clusters, and stealing within one cluster is faster than stealing between clusters. We compare dierent work stealing policies, based on stealing probabilities and we show that the main factor for deciding where to steal from is the load rather than the stealing latency

Related material


EPFL authors