Files

Abstract

Sender-based message logging, a low-overhead mechanism for providing transparent fault-tolerance in distributed systems, is described. It differs from conventional message logging mechanisms in that each message is logged in volatile memory on the machine from which the message is sent. Keeping the message log in the sender's local memory allows one to recover from a single failure at a time without the expense of synchronously logging each message to stable storage. The message log is then asynchronously written to stable storage, without delaying the computation, as part of the sender's periodic checkpoint. Maintaining the sender-based message log requires at most one extra network packet over non-fault-tolerant reliable message communication and imposes little additional synchronization delay. It can be applied transparently to existing distributed applications and does not required specialized hardware. It is currently being implemented on a network of Sun workstations.

Details