Infoscience

Report

Never Say Never Probabilistic & Temporal Failure Detectors (Extended)

The failure detector approach for solving distributed computing problems has been celebrated for its modularity. This approach allows the construction of algorithms using abstract failure detection mechanisms, defined by axiomatic properties, as building blocks. The minimal synchrony assumptions on communication, which enable to implement the failure detection mechanism, are studied separately. Such synchrony assumptions are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they never do. Synchrony assumptions may hold only probabilistically and temporarily. In this paper, we study failure detectors in a realistic distributed system N, with asynchrony inflicted by probabilistic synchronous communication. We address the following paradox about the weakest failure detector to solve the consensus problem (and many equivalent problems), i.e., <>S: an implementation of “consensus with probability 1” is possible in N without using randomness in the algorithm itself, while an implementation of “<>S with probability 1” is impossible to achieve in N. We circumvent this paradox by introducing a new failure detector <>S*, a variant of <>S with probabilistic and temporal accuracy. We prove that <>S* is implementable in N and we provide an optimal <>S* implementation. Interestingly, we show that <>S* can replace <>S , in several existing deterministic consensus algorithms using <>S, to yield an algorithm that solves “consensus with probability 1”. In fact, we show that such result holds for all decisive problems (not only consensus) and also for failure detector <>P (not only <>S). The resulting algorithms combine the modularity of distributed computing practices with the practicality of networking ones.

Related material