Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Boosting Asynchronous Decentralized Learning with Model Fragmentation
 
conference paper

Boosting Asynchronous Decentralized Learning with Model Fragmentation

Biswas, Sayan  
•
Kermarrec, Anne-Marie  
•
Marouani, Alexis
Show more
January 29, 2025
Proceedings of the ACM Web Conference 2025 (WWW '25)
The ACM Web Conference 2025

Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e., nodes with slower compute or communication than others, is a key challenge in DL. We present DivShare, a novel asynchronous DL algorithm that achieves fast model convergence in the presence of communication stragglers. DivShare achieves this by having nodes fragment their models into parameter subsets and send, in parallel to computation, each subset to a random sample of other nodes instead of sequentially exchanging full models. The transfer of smaller fragments allows more efficient usage of the collective bandwidth and enables nodes with slow network links to quickly contribute with at least some of their model parameters. By theoretically proving the convergence of DivShare, we provide, to the best of our knowledge, the first formal proof of convergence for a DL algorithm that accounts for the effects of asynchronous communication with delays. We experimentally evaluate DivShare against two state-of-the-art DL baselines, AD-PSGD and Swift, and with two standard datasets, CIFAR-10 and MovieLens. We find that DivShare with communication stragglers lowers time-to-accuracy by up to 3.9x compared to AD-PSGD on the CIFAR-10 dataset. Compared to baselines, DivShare also achieves up to 19.4% better accuracy and 9.5% lower test loss on the CIFAR-10 and MovieLens datasets, respectively.

  • Files
  • Details
  • Metrics
Type
conference paper
Author(s)
Biswas, Sayan  

EPFL

Kermarrec, Anne-Marie  

EPFL

Marouani, Alexis

École Polytechnique Fédérale de Lausanne

Pereira Pires, Rafael  

EPFL

Sharma, Rishi  

EPFL

de Vos, Marinus Abraham  

EPFL

Date Issued

2025-01-29

Publisher

ACM

Published in
Proceedings of the ACM Web Conference 2025 (WWW '25)
ISBN of the book

979-8-4007-1274-6

Subjects

Decentralized Learning

•

Collaborative Machine Learning

•

Asynchronous Decentralized Learning

•

Communication Stragglers

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
SACS  
Event nameEvent acronymEvent placeEvent date
The ACM Web Conference 2025

WWW

Sydney, Australia

2025-04-28 - 2025-05-02

Available on Infoscience
March 5, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/247489
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés