Abstract

Distributed Shared-Memory (DSM) computers, which partition physical memory among a collection of workstationlike computing nodes, are now a common way to implement parallel machines. Recently, there has been much interest in DSM machines that use software, instead of hardware, to implement coherence protocols to manage data replication and cache coherence. Software offers many advantages, not the least of which is the possibility of adding significant functionality - such as race detection -to a protocol. This paper describes a new, transparent, protocol-based technique for automatically detecting data races on-the-fly. An implementation of this approach in a DSM system running on a Thinking Machines CM-5 found data races in two of a set of five shared-memory benchmarks. Monitored applications had slowdowns ranging from O-3 on 32 nodes.

Details