Model Checking of Distributed Algorithm Implementations

It is notoriously difficult to develop reliable, high-performance distributed systems that run over asynchronous networks. Even if a distributed system is based on a well-understood distributed algorithm, its implementation can contain errors arising from complexities of realistic distributed environments or simply coding errors. Many of these errors can only manifest after the system has been running for a long time, has developed a complex topology, and has experienced a particular sequence of low-probability events such as node resets. Model checking or systematic state space exploration, which has been used for testing of centralized systems, is also not effective for testing of distributed applications. The aim of these techniques is to exhaustively explore all the reachable states and verify some user-specified invariants on them. Although effective for small software systems, for more complex systems such as distributed systems the exponential increase in number of explored states, manifests itself as a problem at the very early stages of search. This phenomenon, which is also known as exponential state space explosion problem, prevents the model checker from reaching the potentially erroneous states at deeper levels, in a realistic time frame. This thesis proposes Dervish, a new approach in testing that makes use of a model checker in parallel with the running distributed system. Before the model checker performance gets hampered by the exponential explosion problem, the model checker restarts form the current live state of the system, instead of the initial state. The continuously running model checker at each node predicts the possible future inconsistencies, before they actually manifest. This approach, not only helps in testing by checking more relevant states that could occur in a real run, but also enables the application to steer the execution away from the predicted inconsistencies. We identified new bugs in mature Mace implementations of RandTree, Bullet', Paxos, and Chord distributed systems. Furthermore, we show that if the bug is not corrected during system development, Dervish is effective in steering the execution away from the inconsistent states at runtime. To be feasible in practice, the state exploration algorithm in Dervish should be efficient enough to explore some useful states in the period between each two restarts. Our default implementation of this approach benefits from a new search heuristic effective for distributed algorithms with short communications, termed consequence prediction, which selectively explores future event chains of the system. For consensus algorithms, however, which are known to be one of the most complex of distributed algorithms, the exploration algorithms built upon principles of model checking centralized systems are not scalable enough to be installed in Dervish. Those approaches reduce the problem of model checking distributed systems to that of centralized systems, by using the global state, which also includes the network state, as the model checking state. This thesis introduces LMC, a novel model checking algorithm designed specifically for distributed algorithms. The key insight in LMC is to treat the local nodes' states separately, instead of keeping track of the global states. We show how Dervish equipped with LMC enables us to find bugs in some complex consensus algorithms, including PaxosInside, the first consensus algorithm proposed and implemented for manycore environments. A modern manycore architecture can be viewed as a distributed system with explicit message passing to communicate between cores. Yet, doing this efficiently is very challenging given the non-uniform latency in inter-core communication and the unpredicted core response time. This thesis explores, for the first time, the feasibility of implementing a (non-blocking) consensus algorithm in a manycore system. We present PaxosInside, a new consensus algorithm that takes up the challenges of manycore environments, such as limited bandwidth of interconnect network as well as the consensus leader. A unique characteristic of PaxosInside is the use of a single acceptor role in steady state, which in our context, significantly reduces the number of exchanged messages between replicas.

Related material