Falsafi, BabakLebeck, Alvin R.Reinhardt, Steven K.Schoinas, IoannisHill, Mark D.Larus, James R.Rogers, AnneWood, David A.2009-04-062009-04-062009-04-06199410.1109/SUPERC.1994.344301https://infoscience.epfl.ch/handle/20.500.14299/36895Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an application's communication patterns and memory semantics. This paper presents evidence that this approach can lead to large performance improvements. It shows that application-specific protocols substantially improved the performance of three application programs- appbt, em3d, and barnes-over carefully tuned transparent shared memory implementations. The speed-ups were obtained on Blizzard, a fine-grained DSM system running on a 32-node Thinking Machines CM-5Application-specific protocols for user-level shared memorytext::conference output::conference proceedings::conference paper