Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Communication-efficient distributed training of machine learning models
 
doctoral thesis

Communication-efficient distributed training of machine learning models

Vogels, Thijs  
2023

In this thesis, we explore techniques for addressing the communication bottleneck in data-parallel distributed training of deep learning models. We investigate algorithms that either reduce the size of the messages that are exchanged between workers, or that reduce the number of messages sent and received.

To reduce the size of messages, we propose an algorithm for lossy compression of gradients. This algorithm is compatible with existing high-performance training pipelines based on the all-reduce primitive and leverages the natural approximate low-rank structure in gradients of neural network layers to obtain high compression rates.

To reduce the number of messages, we study the decentralized learning paradigm where workers do not average their model updates all-to-all in each step of Stochastic Gradient Descent, but only communicate with a small subset of their peers. We extend the aforementioned compression algorithm to operate in this setting. We also study the influence of the communication topology on the performance of decentralized learning, highlighting shortcomings of the typical 'spectral gap' metric to measure the quality of communication topologies, and proposing a new framework for evaluating topologies. Finally, we propose an alternative communication paradigm for distributed learning over sparse topologies. This paradigm, which is based on the concept 'relaying' updates over spanning trees of the communication topology, shows benefits over the typical gossip-based approach, especially when the workers have very heterogeneous data distributions.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9926
Author(s)
Vogels, Thijs  
Advisors
Jaggi, Martin  
Jury

Prof. Anne-Marie Kermarrec (présidente) ; Prof. Martin Jaggi (directeur de thèse) ; Prof. Patrick Thiran, Prof. Mike Rabbat, Prof. Dan Alistarh (rapporteurs)

Date Issued

2023

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2023-04-11

Thesis number

9926

Total of pages

146

Subjects

Deep learning

•

machine learning

•

distributed training

•

decentralized learning

•

gradient compression

•

stochastic gradient descent

EPFL units
MLO  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
April 12, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/196913
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés