Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. An optimisation of allreduce communication in message-passing systems
 
research article

An optimisation of allreduce communication in message-passing systems

Jocksch, Andreas
•
Ohana, Noe  
•
Lanti, Emmanuel  
Show more
October 1, 2021
Parallel Computing

Collective communication, namely the pattern allreduce in message-passing systems, is optimised based on measurements at the installation time of the library. The algorithms used are set up in an initialisation phase of the communication, as so-called persistent collective communication, introduced in the message passing interface (MPI) standard. Part of our allreduce algorithms are the patterns reduce_scatter and allgatherv which are also considered standalone. For the allreduce pattern for short messages the existing cyclic shift algorithm (Bruck's algorithm) is applied with a prefix operation. For allreduce and long messages our algorithm is based on reduce_scatter and allgatherv, where the cyclic shift algorithm is applied with a flexible number of communication ports per node. The algorithms for equal message sizes are used with non-equal message sizes together with a heuristic for rank reordering. Medium message sizes are communicated with an incomplete reduce_scatter followed by allgatherv. Furthermore, an optional recursive application of the cyclic shift algorithm is applied. All algorithms are applied at the node level. The data is gathered and scattered by the cores within the node and the communication algorithms are applied across the nodes. In general, our approach outperforms the non-persistent counterpart in established MPI libraries by up to one order of magnitude or shows equal performance, with a few exceptions of number of nodes and message sizes.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1016/j.parco.2021.102812
Web of Science ID

WOS:000709130200004

Author(s)
Jocksch, Andreas
Ohana, Noe  
Lanti, Emmanuel  
Koutsaniti, Eirini
Karakasis, Vasileios
Villard, Laurent  
Date Issued

2021-10-01

Published in
Parallel Computing
Volume

107

Article Number

102812

Subjects

Computer Science, Theory & Methods

•

Computer Science

•

mpi

•

collective communication

•

allgatherv

•

reduce_scatter

•

allreduce

•

to-all communications

•

algorithms

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
SPC  
Available on Infoscience
November 6, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/182747
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés