Temporal Streaming of Shared Memory

Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to a processor in advance of the corresponding memory accesses. Temporal streaming dynamically identifies address sequences to be streamed by exploiting two common phenomena in shared-memory access patterns: (1) temporal address correlation — groups of shared addresses tend to be accessed together and in the same order, and (2) temporal stream locality — recently- accessed address streams are likely to recur. We present a practical design for temporal streaming. We evaluate our design using a combination of trace-driven and cycle- accurate full-system simulation of a cache-coherent distributed shared-memory system. We show that temporal streaming can eliminate 98% of coherent read misses in scientific applications, and between 43% and 60% in database and web server workloads. Our design yields speedups of 1.07 to 3.29 in scientific applications, and 1.06 to 1.21 in commercial workloads.

Published in:
Proceedings of the International Symposium on Computer Architecture, 222-233
Presented at:
Madison, WI, June 4-8, 2005

 Record created 2007-10-16, last modified 2018-03-18

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)