Agreement-related problems: from semi-passive replication to totally ordered broadcast
Agreement problems constitute a fundamental class of problems in the context of distributed systems. All agreement problems follow a common pattern: all processes must agree on some common decision, the nature of which depends on the specific problem. This dissertation mainly focuses on three important agreements problems: Replication, Total Order Broadcast, and Consensus. Replication is a common means to introduce redundancy in a system, in order to improve its availability. A replicated server is a server that is composed of multiple copies so that, if one copy fails, the other copies can still provide the service. Each copy of the server is called a replica. The replicas must all evolve in manner that is consistent with the other replicas. Hence, updating the replicated server requires that every replica agrees on the set of modifications to carry over. There are two principal replication schemes to ensure this consistency: active replication and passive replication. In Total Order Broadcast, processes broadcast messages to all processes. However, all messages must be delivered in the same order. Also, if one process delivers a message m, then all correct processes must eventually deliver m. The problem of Consensus gives an abstraction to most other agreement problems. All processes initiate a Consensus by proposing a value. Then, all processes must eventually decide the same value v that must be one of the proposed values. These agreement problems are closely related to each other. For instance, Chandra and Toueg [CT96] show that Total Order Broadcast and Consensus are equivalent problems. In addition, Lamport [Lam78] and Schneider [Sch90] show that active replication needs Total Order Broadcast. As a result, active replication is also closely related to the Consensus problem. The first contribution of this dissertation is the definition of the semi-passive replication technique. Semi-passive replication is a passive replication scheme based on a variant of Consensus (called Lazy Consensus and also defined here). From a conceptual point of view, the result is important as it helps to clarify the relation between passive replication and the Consensus problem. In practice, this makes it possible to design systems that react more quickly to failures. The problem of Total Order Broadcast is well-known in the field of distributed systems and algorithms. In fact, there have been already more than fifty algorithms published on the problem so far. Although quite similar, it is difficult to compare these algorithms as they often differ with respect to their actual properties, assumptions, and objectives. The second main contribution of this dissertation is to define five classes of total order broadcast algorithms, and to relate existing algorithms to those classes. The third contribution of this dissertation is to compare the expected performance of the various classes of total order broadcast algorithms. To achieve this goal, we define a set of metrics to predict the performance of distributed algorithms.
Faculté informatique et communications
Institut d'informatique fondamentale
Laboratoire de systèmes répartis
Jury: Roger Hersch, Jean-Yves Le Boudec, Dehlia Malkhi, Friedemann Mattern
Public defense: 2000-8-11
Record created on 2005-03-16, modified on 2016-08-08