Concurrency and dynamic protocol update for group communication middleware
The last three decades have seen computers invading our society: computers are now present at work to improve productivity and at home to enlarge the scope of our hobbies and to communicate. Furthermore, computers have been involved in many critical systems such as anti-locking braking systems (ABS) in our cars, airplane control systems, space rockets, nuclear power plants, banking and trading systems, medical care systems, and so on. The importance of these systems requires a high level of trust in computer-based systems. For example, a failure in a trading system (even if it is temporary) may result in severe economical losses. Hence coping with failures is a key aspect of computer systems. A common approach to tolerate failures is to replicate a system that provides a critical service, so that once a failure occurs on a given replica, the requests to the critical service are still executed by other replicas. This approach has the advantage of masking failures, i.e., requests to the service are continuously executed even in the presence of failures. However, replication introduces a performance cost, mainly because the execution of the service requests must be coordinated among all replicas. Furthermore, despite its apparent simplicity, replication is rather complex to implement. Replication is made easier by group communication which defines several abstractions that can be used by the designer of replicated systems. The group communication abstractions are implemented by distributed protocols that compose a group communication middleware. The aim of the thesis is to study two techniques to improve the performance of group communication middleware, and thus, reduce the cost of replication. First, we study dynamic protocol update, which allows group communication middleware to adapt to environment changes. More particularly, dynamic protocol update consists in replacing at runtime a given protocol composing the group communication middleware with a similar but more efficient protocol. The thesis provides several solutions to dynamic protocol update. For instance, we describe two algorithms to dynamically replace consensus and atomic broadcast, two essential protocols of a group communication middleware. Second, we propose solutions to introduce concurrency within a group communication middleware in order to benefit from the advantages offered by multiprocessor (or multicore) computers.
Keywords: fault tolerance ; replication ; group communication ; middleware ; adaptive systems ; distributed algorithms ; consensus ; atomic broadcast ; dynamic protocol update ; concurrency ; SAMOA ; tolérance aux fautes ; replication ; communication de groupes ; intergiciels ; systèmes adaptifs ; algorithmes distribués ; consensus ; diffusion atomique ; remplacement dynamique de protocoles ; concurrence ; SAMOAThèse École polytechnique fédérale de Lausanne EPFL, n° 4244 (2009)
Programme doctoral Informatique, Communications et Information
Faculté informatique et communications
Institut d'informatique fondamentale
Laboratoire de systèmes répartis
Record created on 2008-10-02, modified on 2016-12-12