Algorithms for Efficient and Robust Distributed Deep Learning

Lin, Tao

doi:10.5075/epfl-thesis-8980

doctoral thesis

Algorithms for Efficient and Robust Distributed Deep Learning

2022

The success of deep learning may be attributed in large part to remarkable growth in the size and complexity of deep neural networks. However, present learning systems raise significant efficiency concerns and privacy: (1) currently, training systems are lagging behind the fast growth of deep neural architectures, and the training efficiency of deep learning algorithms cannot be guaranteed; (2) most learning is operated in a centralized way, yet massive amounts of data are created on decentralized edge devices and may contain sensitive information about users. All of these considerations lead to the necessity to migrate to distributed deep learning.

In this thesis, we study efficiency and robustness, the two fundamental problems that have emerged in distributed deep learning. We first propose strategies to improve communication efficiency---a bottleneck to scaling distributed learning systems out and up---from various aspects: the study starts by understanding the trade-off between communication frequency and generalization performance, and then extends to decentralized and sparse communication topologies with compressed communication. Next, we investigate the computational efficiency issue of deep learning, which is yet another crucial factor that determines the learning and deployment efficiency. The proposed solutions can be generalized to various scenarios. Finally, learning with edge devices introduces various kinds of heterogeneity (e.g. data heterogeneity and system heterogeneity) in practice. As the last key contribution of this thesis, we develop robust decentralized/federated algorithms that are resistant to real-world challenges such as client data distribution shifts and heterogeneous computing systems.

Name

EPFL_TH8980.pdf

Type

n/a

Access type

openaccess

License Condition

copyright

Size

10.68 MB

Format

Adobe PDF

Checksum (MD5)

51043c029d157593c1edf1a2a4eff2ea