Communication trade-offs for Local-SGD with large step size

Patel, Kumar Kshitij; Dieuleveut, Aymeric

Patel, Kumar Kshitij; Dieuleveut, Aymeric

2019

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Synchronous mini-batch SGD is state-of-the-art for large-scale distributed machine learning. However, in practice, its convergence is bottlenecked by slow communication rounds between worker nodes. A natural solution to reduce communication is to use the "local-SGD" model in which the workers train their model independently and synchronize every once in a while. This algorithm improves the computation-communication trade-off but its convergence is not understood very well. We propose a non-asymptotic error analysis, which enables comparison to one-shot averaging i.e., a single communication round among independent workers, and mini-batch averagingi.e., communicating at every step. We also provide adaptive lower bounds on the communication frequency for large step-sizes (t(-alpha), alpha is an element of(1/2, 1)) and show that local-SGD reduces communication by a factor of O(root T/P-3/2), with T the total number of gradients and P machines.

Details

Title Communication trade-offs for Local-SGD with large step size

Author(s) Patel, Kumar Kshitij ; Dieuleveut, Aymeric

Published in Advances In Neural Information Processing Systems 32 (Nips 2019)

Series Advances in Neural Information Processing Systems

Volume 32

Conference 33rd Conference on Neural Information Processing Systems (NeurIPS), Dec 08-14, 2019, Vancouver, CANADA

Date 2019-01-01

Publisher La Jolla, NEURAL INFORMATION PROCESSING SYSTEMS (NIPS)

ISSN 1049-5258

Keywords

stochastic-approximation

Other identifier(s) View record in Web of Science

Laboratories MLO

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > MLO - Machine Learning and Optimization Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2020-07-10

Abstract

Details

Actions