PAI: A lightweight mechanism for single-node memory recovery in DSM servers
Several recent studies identify the memory system as the most frequent source of hardware failures in commercial servers. Techniques to protect the memory system from failures must continue to service memory requests, despite hardware failures. Furthermore, to support existing OS's, the physical address space must be retained following reconfiguration. Existing techniques either suffer from a high performance overhead or require pervasive hardware changes to support transparent recovery. In this paper, we propose Physical Address Indirection (PAI), a lightweight, hardware-based mechanism for memory system failure recovery. PAI provides a simple hardware mapping to transparently reconstruct affected data in alternate locations, while maintaining high performance and avoiding physical address changes. With full-system simulation of commercial and scientific workloads on a 16-node distributed shared memory server, we show that prior techniques have an average degraded mode performance loss of 14% and 51% for commercial and scientific workloads, respectively. Using PAI's dataswap reconstruction, the same workloads have 1% and 32% average performance losses. © 2007 IEEE.
Record created on 2009-04-06, modified on 2016-08-08