Failure Detectors: implementation issues and impact on consensus performance

Sergent, Nicole; Defago, Xavier; Schiper, Andre

Sergent, Nicole; Defago, Xavier; Schiper, Andre

1999

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Due to their nature, distributed systems are vulnerable to failures of some of their parts. Conversely, distribution also provides a way to increase the fault tolerance of the overall system. However, achieving fault tolerance is not a simple problem and requires complex techniques. An agreement problem known as the problem of consensus is at the heart of most problems encountered during the design of a fault tolerant system. This problem is however not solvable in the asynchronous system model, unless the model is augmented with adequate failure detectors. The resulting system model is a time-free model since all timing issues are abstracted by the characteristics of the failure detectors. It is sometimes claimed that time-based system models are more realistic than time-free models for solving distributed agreement problems. The goal of this paper is to show that solving consensus in the asynchronous system model augmented with failure detectors does not prevent from considering timing issues. We consider the consensus algorithm with various implementations of failure detectors, and we analyse their impact on the termination time of the consensus algorithm. This study shows that the design of fault-tolerant distributed algorithms in the asynchronous system model augmented with failure detectors is orthogonal to the issue of implementing the actual failure detectors. This nicely decouples logical issues (proof of safety and liveness of an algorithm) from engineering issues (e.g., performance and timing constraints).

Details

Title Failure Detectors: implementation issues and impact on consensus performance

Author(s) Sergent, Nicole ; Defago, Xavier ; Schiper, Andre

Date 1999

Laboratories LSR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IC Archives > LSR - Distributed Systems Laboratory
Work produced at EPFL
Technical Reports
Published

Record creation date 2005-07-13

Actions

Preview

Select file: