Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

Gold, Brian T.; Falsafi, Babak; Hoe, James C.

doi:10.1109/PRDC.2009.39

Gold, Brian T.; Falsafi, Babak; Hoe, James C.

2009

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Distributed shared-memory (DSM) multi- processors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a key challenge: the increased latency to check results among redundant components. To address performance overheads, we propose a checking filter that reduces the number of checking operations impeding the critical path of execution. Furthermore, we propose to decouple checking operations from the coherence protocol, which simplifies the implementation and permits reuse of existing coherence controller hardware. Our simulation results of commercial workloads indicate average performance overhead is within 4% (9% maximum) of tightly coupled DMR solutions.

Details

Title Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

Author(s) Gold, Brian T. ; Falsafi, Babak ; Hoe, James C.

Published in Proceedings of the 15th IEEE Pacific Rim International Symposium on Dependable Computing
2009 15th IEEE Pacific Rim International Symposium on Dependable Computing

Pages 195-201
195-201

Date 2009

DOI https://doi.org/10.1109/PRDC.2009.39

Laboratories PARSA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > PARSA - Parallel Systems Architecture Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2010-02-15

Actions

Preview

Select file: