Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Replication for Send-Deterministic MPI HPC Applications
 
conference paper

Replication for Send-Deterministic MPI HPC Applications

Lefray, Arnaud
•
Ropars, Thomas  
•
Schiper, André  
2013
FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
3rd Workshop on Fault-Tolerance for HPC at Extreme Scale

Replication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantages of adopting a different approach. First, we try to take advantage of a communication property common to many MPI HPC application, namely send-determinism. Second, we choose to implement replication inside the MPI library. The main advantage of our approach is simplicity. While being only a small patch to the Open MPI library, our solution called SDR-MPI supports most main features of the MPI standard including all collectives and group operations. SDR-MPI additionally achieves good performance: Experiments run with HPC benchmarks and applications show that its overhead remains below 5%.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1145/2465813.2465819
Author(s)
Lefray, Arnaud
Ropars, Thomas  
Schiper, André  
Date Issued

2013

Published in
FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Start page

33

End page

40

Subjects

HPC

•

Replication

•

MPI

•

Fault tolerance

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSR-IC  
Event nameEvent placeEvent date
3rd Workshop on Fault-Tolerance for HPC at Extreme Scale

New-York City, USA

June, 2013

Available on Infoscience
October 10, 2013
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/96166
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés