Broadcasting Messages in Fault-Tolerant Distributed Systems: the benefit of handling input-triggered and output-triggered suspicions differently

Charron-Bost, Bernadette; Defago, Xavier; Schiper, André

Charron-Bost, Bernadette; Defago, Xavier; Schiper, André

2002

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper investigates the two main and seemingly antagonistic approaches to broadcasting messages in fault-tolerant distributed systems: the approach based on Reliable Broadcast, and the one based on View Synchronous Communication (or VSC for short). We discuss both communication primitives in a system model with fair-lossy channel, which leads us to introduce the "time-bounded buffering" problem: VSC addresses this problem, but not Reliable Broadcast. Moreover, we show that VSC solves Reliable Broadcast in a system model with "program-controlled crash". However, VSC does more than Reliable Broadcast, and this has a cost. We analyse this cost by distinguishing between two types of failure suspicions: input-triggered failure suspicions that are related to incoming messages, and output-triggered failure suspicions that are related to outgoing messages. We show that VSC has not managed to exploit the difference between these two types of failure suspicions, which has not allowed to solve the dilemma between (1) short fail-over time and (2) infrequent incorrect exclusion of processes from the membership. We show how to escape from this dilemma by replacing the standard VSC broadcast primitive by two broadcast primitives, one sensitive to input-triggered suspicions, and the other sensitive to output-triggered suspicions. This allows to get the best of two worlds.

Details

Title Broadcasting Messages in Fault-Tolerant Distributed Systems: the benefit of handling input-triggered and output-triggered suspicions differently

Author(s) Charron-Bost, Bernadette ; Defago, Xavier ; Schiper, André

Date 2002

Laboratories LSR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IC Archives > LSR - Distributed Systems Laboratory
Work produced at EPFL
Technical Reports
Published

Record creation date 2005-07-13

Files

Abstract

Details

PDF