TurboTag: Lookup Filtering to Reduce Coherence Directory Power

On-chip coherence directories of today’s multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads have large working sets that are beyond the reach of on-chip caches of modern processors. Limited to capturing a small part of the working set, private caches retain cache blocks only for a short period of time before replacing them with new blocks. Moreover, coherence enforcement is a known performance bottleneck of multi-threaded software, hence data-sharing in optimized high-performance software is minimal. Consequently, the majority of the accesses to the coherence directory find no sharers in the directory because the data are not available in the on-chip private caches, effectively wasting power on the coherence checks. To improve energy-efficiency for future many-core systems, we propose TurboTag, a filtering mechanism to eliminate needless directory lookups. We analyze full-system traces of server and scientific workloads and find that over 69% of accesses to the directory find no sharers and can be entirely avoided. Taking advantage of this behavior, TurboTag achieves a 58% reduction in the directory’s dynamic power consumption.

Published in:
Proceedings of the 16th International Symposium on Low Power Electronics and Design (ISLPED 10), 377-382
Presented at:
16th International Symposium on Low Power Electronics and Design (ISLPED 10), Austin, Texas, USA, August 18-20
New York, NY, USA, ACM

 Record created 2010-06-29, last modified 2018-01-28

External link:
Download fulltext
Rate this document:

Rate this document:
(Not yet reviewed)