Guerraoui, Rachid
Herlihy, Maurice
Kouznetsov, Petr
Lynch, Nancy
Newport, Calvin
On The Weakest Failure Detector Ever
Proceedings of the 26th ACM Symposium on Principles of Distributed Computing (PODC'07)
Many problems in distributed computing are impossible when no information about process failures is available. So what is the minimal yet non-trivial failure information? In other words, what is the minimal information about failures needed to circumvent any impossibility and sufficient to circumvent some impossibility. This paper proposes a candidate abstraction, denoted ?, to capture this failure information. In every run of the distributed system, ? eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it only excludes one possibility of correct set among many, ? still captures non-trivial failure information. We show that ? is sufficient to circumvent the celebrated wait-free set-agreement impossibility. While doing so, we (a) disprove previous conjectures about the weakest failure detector to solve set-agreement and we (b) prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We prove that ? is, in some sense, necessary to circumvent any wait-free impossibility. As a corollary, set-agreement is, from a failure information perspective, a minimal wait-free impossible problem in distributed computing. Our results are generalized through an abstraction ?f that we introduce and prove necessary to solve any problem that cannot be solved in an f -resilient manner, and yet sufficient to solve f -resilient f -set-agreement.
2007