000032615 001__ 32615
000032615 005__ 20190509131945.0
000032615 0247_ $$2doi$$a10.5075/epfl-thesis-2139
000032615 02471 $$2nebis$$a3900288
000032615 037__ $$aTHESIS
000032615 041__ $$aeng
000032615 088__ $$a2139
000032615 245__ $$bfrom fair-lossy channels to crash-recovery of processes$$aSolving consensus
000032615 269__ $$a2000
000032615 260__ $$bEPFL$$c2000$$aLausanne
000032615 300__ $$a123
000032615 336__ $$aTheses
000032615 502__ $$aBernadette Charron-Bost, Rachid Guerraoui, Claude Petitpierre, Luis Rodrigues, Alfred Strohmeier
000032615 520__ $$aDistributed systems are the basis of widespread computing facilities enabling many of our daily life activities. Telebanking, electronic commerce, online booking-reservation, and telecommunication are examples of common services that rely on distributed systems in order to achieve their desirable ubiquity. The success of these services increasingly depends on fault-tolerant systems to preserve their availability and reliability despite partial failures. Fault tolerance can be achieved by introducing redundancy into the system. This leads to the need of coordinating replicated components which has been one of the biggest trends in software technologies. This thesis is about process coordination in fault-tolerant distributed systems. In particular, it focuses on the problem of reaching agreement among processes considering the well-known Consensus problem. Consensus is regarded as an abstract problem whose solutions embody the complexity inherent to most agreement problems. This thesis studies the solvability of consensus in asynchronous network systems augmented with a failure detector oracle. Both, the processes and the network, are subject to benign failures. The thesis' overall contribution is a consensus algorithm that tolerates process crash and network omission failures, and exploits the recovery of crashed process. A contribution of this thesis is the definition of a novel type of weakly reliable communication channels, called Stubborn. The semantics of Stubborn channels reflect the reliability on communication required to solve consensus. Furthermore, it is shown that an algorithm designed with Stubborn in-stead of reliable channels can offer the same number of messages exchanged in favorable runs and avoid communication steps in disadvantageous runs. Stubborn channels can be easily implemented on top of existing network layers and make reasonable requirements on message buffering space. To handle the crash and recovery of processes it is proposed a model which encompasses the different patterns of crashes and recoveries that can be exhibited by the processes. The approach considers two different problems concerning crash and recovery: (1) processes that can be disruptive because they are always crashing and recovering, and (2) amnesia failures caused by crashes. The solution for the first problem consists in the use of a new class of failure detectors distinguished by its recurrent strong completeness property. To cope with the crash and recovery of any process involved, process amnesia cannot be arbitrary and thus processes are equipped with a stable storage device. The proposed algorithm judiciously specifies accesses to stable storage, and by exploiting the recovery of processes it can solve consensus in scenarios where other algorithms would otherwise block.
000032615 700__ $$0(EPFLAUTH)109458$$g109458$$aOliveira, Rui
000032615 720_2 $$aSchiper, André$$edir.$$g106377$$0241767
000032615 8564_ $$uhttps://infoscience.epfl.ch/record/32615/files/EPFL_TH2139.pdf$$zTexte intégral / Full text$$s745699$$yTexte intégral / Full text
000032615 909C0 $$xU10411$$0252206$$pLSR
000032615 909CO $$pDOI$$pIC$$ooai:infoscience.tind.io:32615$$qDOI2$$qGLOBAL_SET$$pthesis
000032615 918__ $$cIIF$$aIC
000032615 919__ $$aLSR
000032615 920__ $$b2000$$a2000-3-24
000032615 970__ $$a2139/THESES
000032615 973__ $$sPUBLISHED$$aEPFL
000032615 980__ $$aTHESIS