Broadcasting Messages in Fault-Tolerant Distributed Systems: the benefit of handling input-triggered and output-triggered suspicions differently
This paper investigates the two main and seemingly antagonistic approaches to broadcasting messages in fault-tolerant distributed systems: the approach based on Reliable Broadcast, and the one based on View Synchronous Communication (or VSC for short). We discuss both communication primitives in a system model with fair-lossy channel, which leads us to introduce the "time-bounded buffering" problem: VSC addresses this problem, but not Reliable Broadcast. Moreover, we show that VSC solves Reliable Broadcast in a system model with "program-controlled crash". However, VSC does more than Reliable Broadcast, and this has a cost. We analyse this cost by distinguishing between two types of failure suspicions: input-triggered failure suspicions that are related to incoming messages, and output-triggered failure suspicions that are related to outgoing messages. We show that VSC has not managed to exploit the difference between these two types of failure suspicions, which has not allowed to solve the dilemma between (1) short fail-over time and (2) infrequent incorrect exclusion of processes from the membership. We show how to escape from this dilemma by replacing the standard VSC broadcast primitive by two broadcast primitives, one sensitive to input-triggered suspicions, and the other sensitive to output-triggered suspicions. This allows to get the best of two worlds.
Record created on 2005-07-13, modified on 2016-08-08