Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. High-Performance RMA-Based Broadcast on the Intel SCC
 
conference paper

High-Performance RMA-Based Broadcast on the Intel SCC

Petrovic, Darko  
•
Shahmirzadi, Omid  
•
Ropars, Thomas  
Show more
2012
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
24th ACM Symposium on Parallelism in Algorithms and Architectures

Many-core chips with more than 1000 cores are expected by the end of the decade. To overcome scalability issues related to cache coherence at such a scale, one of the main research directions is to leverage the message-passing programming model. The Intel Single-Chip Cloud Computer (SCC) is a prototype of a message-passing many-core chip. It offers the ability to move data between on-chip Message Passing Buffers (MPB) using Remote Memory Access (RMA). Performance of message-passing applications is directly affected by efficiency of collective operations, such as broadcast. In this paper, we study how to make use of the MPBs to implement an efficient broadcast algorithm for the SCC. We propose OC-Bcast (On-Chip Broadcast), a pipelined k-ary tree algorithm tailored to exploit the parallelism provided by on-chip RMA. Using a LogP-based model, we present an analytical evaluationthat compares our algorithm with the state-of-the-art broadcast algorithms implemented for the SCC. As predicted by the model, experimental results show that OC-Bcast attains almost three times better throughput, and improves latency by at least 27%. Furthermore, the analytical evaluation highlights the benefits of our approach: OC-Bcast takes direct advantage of RMA, unlike the other considered broadcast algorithms, which are based on a higher-level send/receive interface. This leads us to the conclusion that RMA-based collective operations are needed to take full advantage of hardware features of future message-passing many-core architectures.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1145/2312005.2312029
Author(s)
Petrovic, Darko  
Shahmirzadi, Omid  
Ropars, Thomas  
Schiper, André  
Date Issued

2012

Published in
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Start page

121

End page

130

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LSR-IC  
Event nameEvent placeEvent date
24th ACM Symposium on Parallelism in Algorithms and Architectures

Pittsburgh, Pennsylvania, USA

June 25-27, 2012

Available on Infoscience
April 28, 2012
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/79761
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés