OpenMP for Networks of SMPs
In this paper, we present the first system that implement OpenMP on network of shared-memory multiprocessors. This system enables the program to rely on a single, standard, shared-memory API for parallelization within a multiprocessor and between multiprocessors. It is implemented via a translator that convert OpenMP directives to appropriate calls to a modified version of the TreadMarks software distributed shared memory(SDSM) system. In contrast to previous SDSM systems for SMPs, the modified TreadMarks system uses POSIX threads for parallelism within an SMP mode. This approach greatly simplifies the changes required to the SDSM in order to exploit the intra-mode hardware shared memory. We present performance results for seven applications (Barnes-Hut, CLU and Water from SPLASH-2, 3D-FFT from NAS, Red-Black SOR, TSP and MGS) running on an SP2 with four four-processor SMP nodes. A comparison between the thread implementation and the original implementation of TreadMarks shows that using the hardware shared memory within an SMP node significantly achieves speedups that are up to 30% better than the original versions. We also compare SDSM against message passing. Overall, the speedups of multithreaded RreadMarks programs are within 7-30% of the MPI versions.