A new look at atomic broadcast in the asynchronous crash-recovery model
Atomic broadcast in particular, and group communication in general, have mainly been specified and implemented in a system model where processes do not recover after a crash. The model is called crash-stop. The drawback of this model is its inability to express algorithms that tolerate the crash of a majority of processes. This has led to extend the crash-stop model to the so-called crash-recovery model, in which processes have access to stable storage, to log their state periodically. This allows them to recover a previous state after a crash. However, the existing specifications of atomic broadcast in the crash-recovery model are not satisfactory, and the paper explains why. The paper also proposes a new specification of atomic broadcast in the crash-recovery model that addresses these issues. Specifically, our new specification allows to distinguish between a uniform and a non-uniform version of atomic broadcast. The non-uniform version logs less information, and is thus more efficient. The uniform and non-uniform atomic broadcast have been implemented and compared with a published atomic broadcast algorithm. Performance results are presented.