Causal distributed breakpoints

The authors define a causal distributed breakpoint, which is initiated by a sequential breakpoint in one process of a distributed computation and restores each process in the computation to its earliest state that reflects all events that happened before the breakpoint. An algorithm for finding the causal distributed breakpoint, given a sequential breakpoint in one of the processes, is presented. Approximately consistent checkpoint sets are used for efficiently restoring each process to its state in a causal distributed breakpoint. Causal distributed breakpoints assume deterministic processes that communicate solely by messages. The dependencies that arise from communication between processes are logged. Dependency logging and approximately consistent checkpoint sets are implemented on a network of SUN workstations running the V-System. Overhead on the message-passing primitives varies between 1% and 14% for dependency logging. Execution time overhead for a 200×200 Gaussian elimination is less than 4% and generates a dependency log of 288 kbytes

Presented at:
Proceedings of the Tenth International Conference on Distributed Computer Systems, May 1990

 Record created 2005-10-20, last modified 2018-03-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)