Files

Abstract

Recent research suggests that DSM clusters can benefit from parallel coherence controllers. Parallel controllers requires address partitioning and synchronization to avoid handling multiple coherence events for the same memory address simultaneously. This paper evaluates a spectrum of address partitioning schemes that vary in performance, hardware complexity, and cost. Dynamic partitioning minimizes load imbalance in controllers by using hardware address synchronizers to distribute the load among multiple protocol engines at runtime. Static partitioning obviates the need for hardware synchronization and assigns memory addresses to protocol engines at design time, but may lead to load imbalance among engines. We present simulation results indicating that: (i) dynamic partitioning performs best speeding up application execution on an 8 8-way cluster on average by 62% using four-engine as compared to single-engine controllers, (ii) block- interleaved static partitioning using low-order address bits is an attractive alternative and performs close to dynamic partitioning when protocol occupancies are low or there is little queueing, and (iii) previously proposed static schemes that partition memory pages either into home and remote engines or using low-order page address bits results in a high load imbalance in parallel controllers.

Details

Actions

Preview