Early Consensus in an Asynchronous System with a Weak Failure Detector
Consensus is one of the most fundamental problems in the context of fault-tolerant distributed computing. The problem consists, given a set P of processes having each an initial value vi, in deciding among P on a common value v. In 1985, Fischer, Lynch and Paterson proved that the consensus problem is not solvable in an asynchronous system subject to a single process crash. In 1991, Chandra and Toueg showed that, by augmenting the asynchronous system model with a well defined "unreliable" failure detector, consensus becomes solvable. They also give an algorithm that solves consensus using the <>S failure detector. In this paper we propose a new consensus algorithm, also using the <>S failure detector, that is more efficient than the Chandra-Toueg consensus algorithm. We measure efficiency by introducing the notion of "latency degree", which defines the minimal number of communication steps needed to solve consensus. The Chandra-Toueg algorithm has a latency degree of 3 (it requires at least three communication steps), whereas our early consensus algorithm requires only two communication steps (latency degree of 2). We believe that this is an interesting result, which adds to our current understanding of the cost of consensus algorithms based on <>S.
ERRATUM Lines 34 and 46 of Figure 1 are incomplete. They should read: 34 upon reception of (p_j,r_i,suspicion) from p_j when phase_i=1: 46 r_i := r_i+1; estimate_i.first := i;
Record created on 2005-05-20, modified on 2016-08-08