Schaeli, B.
Gerlach, S.
Hersch, R.D.
A Simulator for Adaptive Parallel Applications
J. Computer System ScienceDirect
J. Computer System ScienceDirect
J. Computer System ScienceDirect
J. Computer System ScienceDirect
doi:10.1016/j.physletb.2003.10.071
Adaptive
parallel
application
simulation
Dynamic
efficiency
Sensitivity
analysis
Partial
direct
execution
2007
2007
Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We describe a simulation framework supporting dynamic node allocation which, given a simple cluster model, predicts the running time of parallel applications taking CPU and network sharing into account. Simulations can be carried out without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements are reduced. In partial direct execution simulations, the application's parallel behavior is retrieved via direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. Simulations may then vary cluster model parameters, operation durations and problem decomposition parameters to analyze their impact on the application performance and identify the limiting factors. We implemented the proposed techniques by adding direct execution simulation capabilities to the Dynamic Parallel Schedules parallelization framework. We introduce the concept of dynamic efficiency to express the resource utilization efficiency as a function of time. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, respectively the dynamic efficiency, predicted by the simulator under different parallelization and dynamic node allocation strategies.
J. Computer System ScienceDirect
Journal Articles
10.1016/j.jcss.2007.07.003