Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Fault Tolerance in the Parareal Method
 
conference paper

Fault Tolerance in the Parareal Method

Nielsen, Allan Svejstrup  
•
Hesthaven, Jan S.  
2016
Proceedings Of The Acm Workshop On Fault-Tolerance For Hpc At Extreme Scale (Ftxs'16)
ACM Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)

Parallel-in-time integration is an often advocated approach for extracting parallelism in the solution of PDEs beyond what is possible using spacial domain decomposition tech- niques. Due to the comparatively low parallel efficiency of parallel-in-time integration techniques, they are primar- ily of interest as an extension for classical approaches at parallelism. As such, potential applications are expected to scale across several hundreds, or possibly thousands of nodes, making algorithmic resilience towards hardware in- duced errors highly relevant. In this work we develop a scheduling scheme for the parareal algorithm that is resilient to node-loss. The fault-tolerant scheme is based on a popu- lar approach introduced by E. Aubanel in [1], modified with a set of MPI interface extensions for implementing recov- ery strategies available in the ULFM framework. In ad- dition, we demonstrate how the parareal algorithm may be made resilient towards Silent-Data-Corruption (SDC) errors by viewing it as a point-iterative method, locally monitor- ing the residual between consecutive iterations so to discard potentially corrupt iterations.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1145/2909428.2909431
Web of Science ID

WOS:000383740400001

Author(s)
Nielsen, Allan Svejstrup  
Hesthaven, Jan S.  
Date Issued

2016

Publisher

Assoc Computing Machinery

Publisher place

New York

Published in
Proceedings Of The Acm Workshop On Fault-Tolerance For Hpc At Extreme Scale (Ftxs'16)
ISBN of the book

978-1-4503-4349-7

Total of pages

8

Start page

1

End page

8

Subjects

Resilience

•

Parallel-in-time

•

Parareal

•

Exascale

•

HPC

•

Fault-tolerance

•

Silent-Data-Corruption

•

Parallel Computing

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
MCSS  
Event name
ACM Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)
Available on Infoscience
February 24, 2016
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/124392
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés