Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms

Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant algorithms. We use the methodology to compare two atomic broadcast algorithms with different fault tolerance mechanisms: unreliable failure detectors and group membership. We evaluated the steady state latency in (1) runs with no crashes and no suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as (4) the transient latency after a crash. We found that the two algorithms have the same performance in Scenario 1, and that the group membership based algorithm has an advantage in terms of performance and resiliency in Scenario 2, whereas the failure detector based algorithm offers better performance in the other scenarios. We discuss the implications of our results to the design of fault tolerant distributed systems.

Related material