Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. The RECIPE approach to challenges in deeply heterogeneous high performance systems
 
research article

The RECIPE approach to challenges in deeply heterogeneous high performance systems

Agosta, Giovanni
•
Fornaciari, William
•
Atienza, David  
Show more
September 1, 2020
Microprocessors And Microsystems

RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximizing hardware lifetime and guarantee application performance is identified as the key concern for RECIPE. We address it through hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modeling thermal properties and mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case. (c) 2020 Elsevier B.V. All rights reserved.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.micpro.2020.103185
Web of Science ID

WOS:000571471000011

Author(s)
Agosta, Giovanni
Fornaciari, William
Atienza, David  
Canal, Ramon
Cilardo, Alessandro
Flich Cardo, Jose
Hernandez Luz, Carles
Kulczewski, Michal
Massari, Giuseppe
Tornero Gavila, Rafael
Show more
Date Issued

2020-09-01

Publisher

ELSEVIER

Published in
Microprocessors And Microsystems
Volume

77

Article Number

103185

Subjects

Computer Science, Hardware & Architecture

•

Computer Science, Theory & Methods

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

hpc

•

heterogeneous computing

•

run-time management

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
ESL  
Available on Infoscience
October 7, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/172269
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés