Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms (extended version)

Urbán, Péter; Shnayderman, Ilya; Schiper, André

report

Urbán, Péter

•

Shnayderman, Ilya

•

Schiper, André

2003

Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant algorithms. We use the methodology to compare two atomic broadcast algorithms with different fault tolerance mechanisms: unreliable failure detectors and group membership. We evaluated the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as (4) the transient latency after a crash. We found that the two algorithms have the same performance in Scenario 1, and that the group membership based algorithm has an advantage in terms of performance and resiliency in Scenario 2, whereas the failure detector based algorithm offers better performance in the other scenarios. We discuss the implications of our results to the design of fault tolerant distributed systems.

Name

IC_TECH_REPORT_200315.pdf

Access type

openaccess

Size

233.36 KB

Format

Adobe PDF

Checksum (MD5)

a11ba883f750cc0445a3f6a9d6eb339f