Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Books and Book parts
  4. Fault-tolerant parallel applications with dynamic parallel schedules: a programmer's perspective
 
book part or chapter

Fault-tolerant parallel applications with dynamic parallel schedules: a programmer's perspective

Gerlach, S.  
•
Schaeli, B.  
•
Hersch, R. D.  
2006
Dependable Systems: Software, Computing, Networks. Research Results of the DICS Program

Dynamic parallel schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automatic pipelined parallel execution of applications. DPS supports graceful degradation of parallel applications in case of node failures. The fault-tolerance mechanism relies on a set of backup threads stored in the volatile storage of alternate nodes that are kept up to date by both duplicating transmitted data objects and performing periodical checkpointing. The current state of a failed node can be reconstructed on its backup threads by re-executing the application since the last checkpoint. A valid execution order is automatically deduced from the flow graph. The addition of fault-tolerance to a DPS application requires only minor changes to the application's source code. The present contribution focuses on the development of fault-tolerant parallel applications with DPS from a programmer's perspective

  • Details
  • Metrics
Type
book part or chapter
DOI
10.1007/11808107_9
Web of Science ID

WOS:000240040000009

Author(s)
Gerlach, S.  
Schaeli, B.  
Hersch, R. D.  
Date Issued

2006

Publisher

Sptinger-Verlag

Published in
Dependable Systems: Software, Computing, Networks. Research Results of the DICS Program
Start page

195

End page

210

Subjects

checkpointing

•

flow graphs

•

multi-threading

•

pipeline processing

•

processor scheduling

•

software fault tolerance

Written at

EPFL

EPFL units
LSP  
Available on Infoscience
January 31, 2007
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/240263
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés