Infoscience

Conference paper

Failure Detectors as First Class Objects

FAILURE DETECTORS AS FIRST CLASS OBJECTS P.Felber, X.Defago, R.Guerraoui, and P.Oser ABSTRACT: One of the fundamental differences between a centralized system and a distributed one is the notion of _partial failures_. The ability to efficiently and accurately detect failures is a key element underlying reliable distributed computing. In current distributed systems however, failure detection is either left to the application developer or hidden from the programmer and provided in an ad hoc manner behind the scene. We plead for an intermediate approach where failure detectors are _first class objects_. We view failure detection as an abstraction, the complexity of which is encapsulated behind well defined interfaces. The various roles of a failure detection service are all represented as first class objects. Following our approach, one can reuse existing failure detection protocols as they are or, through composition or refinement, define new protocols that match the application requirements. We describe an interesting result of a composition that mixes push and pull failure monitoring and we show how scalability issues may be addressed by using a hierarchical failure detection configuration. We also discuss the implementation of our failure service both in CORBA and in Java.

    Reference

    • LSR-CONF-1999-005

    Record created on 2005-05-20, modified on 2016-08-08

Related material