A Pragmatic Approach for Predicting the Scalability of Parallel Applications

Predicting the scalability of parallel applications is becoming crucial now that the number of cores in modern CPUs doubles roughly every two years. Traditional ways to get some understanding of the scalability of a parallel application rely on extensive experiments or detailed application models. Both are very time consuming and often hard to use. This paper presents PreSca, a pragmatic system for predicting the scalability of parallel applications. PreSca uses function approximation techniques to model scalability with an analytical performance function extracted from a set of measurements. By considering the application as a black-box without requiring any knowledge about its internals, PreSca can be applied with little ef- fort to any parallel application. We show how PreSca can be used statically to predict the scalability of a given application and decide which synchronization primitive scales best for it as well as how it can be used on-line to dynamically assist scheduling decisions and adjust core assignment. In some sense, PreSca shows, for the first time, how function approximation can be used to predict the scalability of parallel applications in a completely general way. We extensively evaluated PreSca using a large number of parallel benchmarks, including some that use locks and some that use transactional memory. We also consider two different multi- core systems. Our evaluation shows that PreSca produces accurate results. More specifically: (1) PreSca’s interpolations based on only 8 measurements have 90th percentile of error lower than 15%, (2) PreSca’s extrapolations using measurements with up to m cores predict the performance for n <= 2m cores with errors lower than 20% in most cases, and (3) PreSca’s on-line scheduler determines the optimal thread count using fewer than 7 measurements with errors lower than 3% on average.

Related material