Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Koloskova, Anastasiia

doi:10.5075/epfl-thesis-9927

doctoral thesis

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

2024

Distributed learning is the key for enabling training of modern large-scale machine learning models, through parallelising the learning process. Collaborative learning is essential for learning from privacy-sensitive data that is distributed across various agents, each having distinct data distributions. Both tasks are distributed in nature, which brings them under a common umbrella.

In this thesis, we examine algorithms for distributed and collaborative learning through the perspective of optimization theory. Specifically, we delve into the theoretical convergence properties of prevalent algorithms (e.g., decentralized SGD, local SGD, asynchronous SGD, clipped SGD, among others), and we address ways to enhance their efficiency.

A significant portion of this thesis centers on decentralized optimization methods for both distributed and collaborative learning applications. These are optimization techniques where agents interact directly with one another, bypassing the need for a central coordinator.

First, we address the challenge of communication efficiency in decentralized learning, introducing Choco-SGD---a first decentralized optimization algorithm which allows arbitrary high communication compression and at the same time comes with provable convergence guarantees.

Subsequently, we analyze the convergence properties of a large class of distributed and decentralized algorithms including decentralized SGD---the predominant algorithm for decentralized optimization---under a unified framework. Our approach imposes milder assumptions on functions and communication topologies than existing works. Our theory recovers and improves the previously best-known convergence for all of the algorithms covered in the framework. Our theory reveals that the performance of decentralized SGD-based algorithms deteriorates when agents have differing data---a scenario commonly encountered in practical collaborative learning applications. We therefore study and improve convergence rates of Gradient Tracking---the algorithm whose convergence behavior is known to be unaffected by the data heterogeniety.

Additionally, we explore asynchronous optimization techniques as an alternative approach to enhance communication efficiency. In this thesis, we explore a more straightforward centralized communication scheme. We improve existing theoretical convergence bounds of the asynchronous SGD algorithm. In particular, one of our main contributions is to show that asynchronous SGD in distributed learning scenarios is always faster than the synchronous minibatch SGD.

Lastly, we turn our attention to privacy aspects, which are especially important for collaborative learning algorithms, as the local agents' data are typically privacy sensitive. We study the convergence behavior of the clipped SGD algorithm---an integral part of any differentially private learning algorithm. We also study the convergence properties of MF-DP-FTRL algorithms which inject correlated noise over the iterations to achieve privacy and we propose a new correlated noise schedule based on our enhanced theory.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-9927

Author(s)

Koloskova, Anastasiia

Advisors

Jaggi, Martin

•

Stich, Sebastian Urban

Jury

Prof. Rachid Guerraoui (président) ; Prof. Martin Jaggi, Dr Sebastian Urban Stich (directeurs) ; Prof. Volkan Cevher, Prof. Ali Jadbabaie, Prof. Gauri Joshi (rapporteurs)

Date Issued

2024

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2024-02-16

Thesis number

9927

Total of pages

262

Subjects

Stochastic optimization

•

decentralized learning

•

distributed learning

•

machine learning

•

compressed communications

•

asynchronous communications

•

gradient clipping

EPFL units

Faculty

School

Doctoral School

Award

ELLIS PhD Award

2024

Available on Infoscience

February 5, 2024

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/203465