Conference paper

Never Say Never Probabilistic & Temporal Failure Detectors

The failure detector approach for solving distributed computing problems has been celebrated for its modularity. This approach allows the construction of algorithms using abstract failure detection mechanisms, defined by axiomatic properties, as building blocks. The minimal synchrony assumptions on communication, which enable to implement the failure detection mechanism, are studied separately. Such synchrony assumptions are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they never do. Synchrony assumptions may hold only probabilistically and temporarily. In this paper, we study failure detectors in a realistic distributed system N, with asynchrony inflicted by probabilistic synchronous communication. We address the following paradox: an implementation of "consensus with probability 1" is possible in N without using randomness in the algorithm itself, while an implementation of "lozenge S with probability 1" is impossible to achieve in N (lozenge S being the weakest failure detector to solve the consensus problem and many equivalent problems). We circumvent this paradox by introducing a new failure detector lozenge S*, a variant of lozenge S with probabilistic and temporal accuracy. We prove that lozenge S* is implementable in N and we provide an optimal lozenge S* algorithm. Interestingly, we show that lozenge S* can replace lozenge S, in several existing deterministic consensus algorithms using lozenge S, to yield an algorithm that solves "consensus with probability 1". In fact, we show that such result holds for all decisive problems (not only consensus) and also for failure detector lozenge P (not only lozenge S). The resulting algorithms combine the modularity of distributed computing practices with the practicality of networking ones.


Related material