000174869 001__ 174869
000174869 005__ 20190316235312.0
000174869 037__ $$aREP_WORK
000174869 245__ $$aA Pragmatic Approach for Predicting the Scalability of Parallel Applications
000174869 269__ $$a2012
000174869 260__ $$bEPFL$$c2012$$aLausanne
000174869 300__ $$a22
000174869 336__ $$aReports
000174869 520__ $$aPredicting the scalability of parallel applications is becoming crucial now that the number of cores in modern CPUs doubles roughly every two years. Traditional ways to get some understanding of the scalability of a parallel application rely on extensive experiments or detailed application models. Both are very time consuming and often hard to use. This paper presents PreSca, a pragmatic system for predicting the scalability of parallel applications. PreSca uses function approximation techniques to model scalability with an analytical performance function extracted from a set of measurements. By considering the application as a black-box without requiring any knowledge about its internals, PreSca can be applied with little ef- fort to any parallel application. We show how PreSca can be used statically to predict the scalability of a given application and decide which synchronization primitive scales best for it as well as how it can be used on-line to dynamically assist scheduling decisions and adjust core assignment. In some sense, PreSca shows, for the first time, how function approximation can be used to predict the scalability of parallel applications in a completely general way. We extensively evaluated PreSca using a large number of parallel benchmarks, including some that use locks and some that use transactional memory. We also consider two different multi- core systems. Our evaluation shows that PreSca produces accurate results. More specifically: (1) PreSca’s interpolations based on only 8 measurements have 90th percentile of error lower than 15%, (2) PreSca’s extrapolations using measurements with up to m cores predict the performance for n <= 2m cores with errors lower than 20% in most cases, and (3) PreSca’s on-line scheduler determines the optimal thread count using fewer than 7 measurements with errors lower than 3% on average.
000174869 6531_ $$aParallel programming
000174869 6531_ $$aPerformance prediction
000174869 6531_ $$aFunction approximation
000174869 6531_ $$aScalability
000174869 700__ $$0242986$$g173244$$aDragojevic, Aleksandar
000174869 700__ $$aGuerraoui, Rachid$$g105326$$0240335
000174869 8564_ $$uhttps://infoscience.epfl.ch/record/174869/files/perf-fun-tr_1.pdf$$zn/a$$s1037241$$yn/a
000174869 909C0 $$xU10407$$0252114$$pDCL
000174869 909CO $$ooai:infoscience.tind.io:174869$$qGLOBAL_SET$$pIC$$preport
000174869 917Z8 $$x173244
000174869 917Z8 $$x173244
000174869 917Z8 $$x173244
000174869 917Z8 $$x173244
000174869 937__ $$aEPFL-REPORT-174869
000174869 973__ $$sPUBLISHED$$aEPFL
000174869 980__ $$aREPORT