Conception and implementation of a toolkit for building fault-tolerant distributed applications in large scale networks

Large scale systems are becoming more and more common today. Many distributed applications are emerging that use the capability of world-wide internetworking. Since many applications require availability and consistency in the presence of failures, an adequate support for fault-tolerance is necessary. This can be provided by different paradigms and their implementations. Unfortunately, most of these implementations consider only local area networks, whereas this thesis describes a system, called Phoenix, which aims at large scale networks where additional types of failure have to be taken into account. This thesis gives a complete description of Phoenix, a toolkit for building fault-tolerant, distributed applications in large scale networks. Fault-tolerance in Phoenix is achieved using replicated process groups, and consistency within one process group is achieved by using view synchronous communication. The particularity of Phoenix is the provision of this fault-tolerance and consistency in a large scale environment, where large scale is two-fold: (1) the wide geographical distribution of the replicated processes, and (2) a high number of participating processes in the system. The description of Phoenix given here is based on its architecture. Each layer of Phoenix focuses on a particular problem and proposes a solution. Lower layers are responsible for the geographical large scale aspects and their problems, whereas higher layers provide high order communication and deal with numerical large scale aspects. In large scale networks, in addition to the increased unpredictable latency of messages, communication protocols have to deal with link failures, which are often only transient. The dynamic routing layer in the Phoenix architecture tries to mask these link failures by rerouting. This rerouting not only gives increased reliability of communication, but also a more stable and accurate image of the reachability of the processes. On top of the dynamic routing layer, the reliable communication layer provides eventually reliable channels, i.e. messages sent are eventually delivered at the destination provided that the sender and the destination processes are correct. This layer takes into account different parameters of large scale networks, such as (1) increased, unpredictable latency, and (2) non-negligible packet desequencing and (3) important packet loss. The consistency among the replicas is based on a new implementation of the virtually synchronous communication paradigm. The implementation is part of the view synchronous communication layer and is based on a modified consensus protocol together with the eventually reliable channels of the reliable communication layer. The modified consensus protocol itself is based on an unstable suspicion model, where incorrectly suspected processes can be considered alive at a later point. This will be exploited to make the protocol alive whenever a majority of replicas can communicate with each other. The situation where a distributed system is cut into smaller subsystems, and none of these subsystems contains a majority, is not uncommon in large scale, but is often only transient. Further, the dynamic routing layer already does a maximum to avoid this situation. Based on the view synchronous communication layer, the ordered multicast communication layer provides different ordering primitives based on solid, theoretical definitions, allowing the implementation of different total and uniform orders. The numerical large scale is considered by assigning different roles to the processes of a distributed system without leaving the context of groups. The idea is to concentrate the fault-tolerant aspect to a small set of core processes, whilst still guaranteeing convenient and efficient access semantics to processes outside these core processes.

Schiper, André
Lausanne, EPFL

 Record created 2005-03-16, last modified 2018-03-17

Texte intégral / Full text:
Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)