Quantifying the Performance Differences Between PVM and TreadMarks
We compare two systems for parallel programming on networks of workstations: Parallel Virtual Machine (PVM) a message passing system, and TreadMarks, a software distributed shared memory (DSM) system. We present results for eight applications that were implemented using both systems. The programs are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS) and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR) and Traveling Salesman (TSP). Two different input data sets were used for five of the applications. We use two execution environments. The first is an 155 Mbps ATM network with eight Sparc-20 model 61 workstations; the second is an eight processor IBM SP/2. The differences in speedup between TreadMarks and PVM are dependent on the application, and, only to much a lesser extent, on the platform and the data set used. In particular, the TreadMarks speedup for six of the eight applications is within 15% of that achieved with PVM. For one application, the difference in speedup is between 15% and 30%, and for one application, the difference is around 50%. More important than the actual differences in speedups, we investigate the causes behind these differences. The cost of sending and receiving messages on current networks of workstations is very high, and previous work has identified communication costs as the primary source of overhead in software DSM implementations. The observed performance differences between PVM and TreadMarks are therefore primarily a result of differences in the amount of communication between the two systems. We identified four factors that contribute to the larger amount of communication in TreadMarks:1) extra messages due to the separation of synchronization and data transfer, 2) extra messages to handle access misses caused by the use of an invalidate protocol, 3) false sharing, and 4) d iff accumulation for migratory data. We have quantified the effect of the last three factors by measuring the performance gain when each is eliminated. Because the separation of synchronization and data transfer is a fundamental characteristic of the shared memory model, there is no way to measure its contribution to performance without completely deviating from the shared memory model. Of the three remaining factors, TreadMarks’ inability to send data belonging to different pages in a single message is the most important. The effect of false sharing is quite limited. Reducing diff accumulation benefits migratory data only when the diffs completely overlap. When these performance impediments are removed, all of the TreadMarks programs perform within 25% of PVM, and for six out of eight experiments, TreadMarks is less than 5% slower than PVM.
Record created on 2005-10-17, modified on 2016-08-08