Causal distributed breakpoints

Fowler, Jerry; Zwaenepoel, Willy

conference paper not in proceedings

Fowler, Jerry

•

Zwaenepoel, Willy

1990

Proceedings of the Tenth International Conference on Distributed Computer Systems

The authors define a causal distributed breakpoint, which is initiated by a sequential breakpoint in one process of a distributed computation and restores each process in the computation to its earliest state that reflects all events that happened before the breakpoint. An algorithm for finding the causal distributed breakpoint, given a sequential breakpoint in one of the processes, is presented. Approximately consistent checkpoint sets are used for efficiently restoring each process to its state in a causal distributed breakpoint. Causal distributed breakpoints assume deterministic processes that communicate solely by messages. The dependencies that arise from communication between processes are logged. Dependency logging and approximately consistent checkpoint sets are implemented on a network of SUN workstations running the V-System. Overhead on the message-passing primitives varies between 1% and 14% for dependency logging. Execution time overhead for a 200×200 Gaussian elimination is less than 4% and generates a dependency log of 288 kbytes

Name

00089277.pdf

Access type

openaccess

Size

595.43 KB

Format

Adobe PDF

Checksum (MD5)

ad9c13925470bad4e84ce6bb185ec481