Fuzzati, Rachele
Nestmann, Uwe
Schiper, AndrĂ©
A formal approach to fault tolerant distributed consensus
213
formal methods, distributed algorithms
distributed consensus
transition systems
correctness proofs
termination
2008
2008
The term distributed Consensus denotes the problem of getting a certain number of processes, that could be far away from each other and that exchange messages through some communication means, to all agree on the same value. This problem has been proved impossible to solve in asynchronous settings when at least one process can crash, i.e., stop working. Since the problem of reaching Consensus among processes is recurrent in the domain of distributed computation, many algorithms have been proposed for solving it, circumventing the impossibility result through the introduction of some kind of synchrony in the system. Such algorithms are traditionally expressed in natural language or in pseudocode, thus sometimes generating ambiguities on their contents and on their correctness proofs. In this thesis, we propose a simple, yet efficient way of providing formal descriptions and proofs of distributed Consensus algorithms. Such method is based on the use of inference rules, it requires very little prior knowledge in order to be understood, and follows closely the way algorithms are expressed in pseudocode, thus being intuitive for the users. To show the validity of our claims, we use our method to formalize two of the major distributed Consensus algorithms, namely the Chandra-Toueg and the Paxos algorithms. Using our rigorous description, we then formally prove that such algorithms guarantee the respect of the Validity, Agreement and Termination properties that every solution to the Consensus problem should provide. This proving exercise actually reveals interesting results. We see that the Chandra-Toueg and the Paxos algorithms have strong points of resemblance and their correctness proofs can be carried out in very similar manners. However, while the Chandra-Toueg algorithm proves to be correct under the point of view of the three properties, we discover that Paxos does not give any guarantee of terminating. This generates a philosophical question: should such algorithm be considered a Consensus algorithm or not?
EPFL
3980
Theses
10.5075/epfl-thesis-3980