Files

Abstract

Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high performance implementations. Release consistency (RC), however, enables the highest performance implementations but puts the burden on the programmer to specify which memory operations need to be atomic and in program order. This paper shows, for the first time, that SC implementations can perform as well as RC implementations if the hardware provides enough support for speculation. Both SC and RC implementations rely on reordering and overlapping memory operations for high performance. To enforce order when necessary, an RC implementation uses software guarantees, whereas an SC implementation relies on hardware speculation. Our SC implementation, called SC++, closes the performance gap because: (1) the hardware allows not just loads, as some current SC implementations do, but also stores to bypass each other speculatively to hide remote latencies, (2) the hardware provides large speculative state for not just processor, as previously proposed, but also memory to allow out-of- order memory operations, (3) the support for hardware speculation does not add excessive overheads to processor pipeline critical paths, and (4) well- behaved applications incur infrequent rollbacks of speculative execution. Using simulation, we show that SC++ achieves an RC implementation's performance in all the six applications we studied

Details

PDF