Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Replication for Send-Deterministic MPI HPC Applications
 
conference paper

Replication for Send-Deterministic MPI HPC Applications

Lefray, Arnaud
•
Ropars, Thomas  
•
Schiper, André  
2013
FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
3rd Workshop on Fault-Tolerance for HPC at Extreme Scale

Replication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantages of adopting a different approach. First, we try to take advantage of a communication property common to many MPI HPC application, namely send-determinism. Second, we choose to implement replication inside the MPI library. The main advantage of our approach is simplicity. While being only a small patch to the Open MPI library, our solution called SDR-MPI supports most main features of the MPI standard including all collectives and group operations. SDR-MPI additionally achieves good performance: Experiments run with HPC benchmarks and applications show that its overhead remains below 5%.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

ftxs06-lefray.pdf

Type

Preprint

Version

http://purl.org/coar/version/c_71e4c1898caa6e32

Access type

openaccess

Size

471.85 KB

Format

Adobe PDF

Checksum (MD5)

a515f00766681fb45b27abd611d3cc92

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés