Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Decentralized in-order execution of a sequential task-based code for shared-memory architectures
 
conference paper

Decentralized in-order execution of a sequential task-based code for shared-memory architectures

Castes, Charly  
•
Agullo, Emmanuel
•
Aumage, Olivier
Show more
January 1, 2022
2022 Ieee 36Th International Parallel And Distributed Processing Symposium Workshops (Ipdpsw 2022)
36th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS)

The hardware complexity of modern machines makes the design of adequate programming models crucial for jointly ensuring performance, portability, and productivity in high-performance computing (HPC). Sequential task-based programming models paired with advanced runtime systems allow the programmer to write a sequential algorithm independently of the hardware architecture in a productive and portable manner, and let a third party software layer -the runtime system- deal with the burden of scheduling a correct, parallel execution of that algorithm to ensure performance. Many HPC algorithms have successfully been implemented following this paradigm, as a testimony of its effectiveness.

Developing algorithms that specifically require fine-grained tasks along this model is still considered prohibitive, however, due to per-task management overhead [1], forcing the programmer to resort to a less abstract, and hence more complex "task+X" model. We thus investigate the possibility to offer a tailored execution model, trading dynamic mapping for efficiency by using a decentralized, conservative in-order execution of the task flow, while preserving the benefits of relying on the sequential taskbased programming model. We propose a formal specification of the execution model as well as a prototype implementation, which we assess on a shared-memory multicore architecture with several synthetic workloads. The results show that under the condition of a proper task mapping supplied by the programmer, the pressure on the runtime system is significantly reduced and the execution of fine-grained task flows is much more efficient.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/IPDPSW55747.2022.00095
Web of Science ID

WOS:000855041000069

Author(s)
Castes, Charly  
Agullo, Emmanuel
Aumage, Olivier
Saillard, Emmanuelle
Date Issued

2022-01-01

Publisher

IEEE COMPUTER SOC

Publisher place

Los Alamitos

Published in
2022 Ieee 36Th International Parallel And Distributed Processing Symposium Workshops (Ipdpsw 2022)
ISBN of the book

978-1-6654-9747-3

Series title/Series vol.

IEEE International Symposium on Parallel and Distributed Processing Workshops

Start page

552

End page

561

Subjects

Computer Science, Hardware & Architecture

•

Computer Science, Theory & Methods

•

Computer Science

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
DCSL  
Event nameEvent placeEvent date
36th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS)

ELECTR NETWORK

May 30-Jun 03, 2022

Available on Infoscience
October 10, 2022
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/191312
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés