The Heard-Of Model: Computing in Distributed Systems with Benign Failures
Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas:
- Degree of synchrony and failure model are two independent parameters that determine a particular type of system.
- The notion of faulty component is helpful and even necessary for the analysis of distributed computations when failures occur.
In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign failures: we present a computational model, suitable for systems with benign failures, which is based only on the notion of transmission failure.
In this model, computations evolve in rounds, and messages missed at a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that p ``hears of'' at round r (heard-of set) namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework.
Using this new approach, we are able to give shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that in general, Consensus cannot be solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions.