The database state machine and group communication issues

Distributed computing is reshaping the way people think about and do daily life activities. On-line ticket reservation, electronic commerce, and telebanking are examples of services that would be hardly imaginable without distributed computing. Nevertheless, widespread use of computers has some implications. As we become more depend on computers, computer malfunction increases in importance. Until recently, discussions about fault tolerant computer systems were restricted to very specific contexts, but this scenario starts to change, though. This thesis is about the design of fault tolerant computer systems. More specifically, this thesis focuses on how to develop database systems that behave correctly even in the event of failures. In order to achieve this objective, this work exploits the notions of data replication and group communication. Data replication is an intuitive way of dealing with failures: if one copy of the data is not available, access another one. However, guaranteeing the consistency of replicated data is not an easy task. Group communication is a high level abstraction that defines patterns on the communication of computer sites. The present work advocates the use of group communication in order to enforce data consistency. This thesis makes four major contributions. In the database domain, it introduces the Database State Machine and the Reordering technique. The Database State Machine is an approach to executing transactions in a cluster of database servers that communicate by message passing, and do not have access to shared memory nor to a common clock. In the Database State Machine, read-only transactions are processed locally on a database site, and update transactions are first executed locally on a database site, and them broadcast to the other database sites for certification and possibly commit. The certification test, necessary to commit update transactions, may result in aborts. In order to increase the number of transactions that successfully pass the certification test, we introduce the Reordering technique, which reorders transactions before they are committed. In the distributed system domain, the Generic Broadcast problem and the Optimistic Atomic Broadcast algorithm are proposed. Generic Broadcast is a group communication primitive that allows applications to define any order requirement they need. Reliable Broadcast, which does not guarantee any order on the delivery of messages, and Atomic Broadcast, which guarantees total order on the delivery of all messages, are special cases of Generic Broadcast. Using Generic Broadcast, we define a group communication primitive that guarantees the exact order needs of the Database State Machine. We also present an algorithm that solves Generic Broadcast. Optimistic Atomic Broadcast algorithms exploit system properties in order to implement total order delivery fast. These algorithms are based on system properties that do not always hold. However, it they hold for a certain period, ensuring total order delivery of messages is done faster than with traditional Atomic Broadcast algorithms. This thesis discusses optimism in the implementation of Atomic Broadcast primitives, and presents in detail the Optimistic Atomic Broadcast algorithm. The optimistic broadcast approach presented in this thesis is based on the spontaneous total order message reception property, which holds with high probability in local area networks under normal execution conditions (e.g., moderate load).

Schiper, André
Lausanne, EPFL

 Record created 2005-03-16, last modified 2018-01-27

Rate this document:

Rate this document:
(Not yet reviewed)