Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Student works
  4. Convex Optimization using Sparsified Stochastic Gradient Descent with Memory
 
master thesis

Convex Optimization using Sparsified Stochastic Gradient Descent with Memory

Cordonnier, Jean-Baptiste  
June 27, 2018

The interest for distributed stochastic optimization has raised to train complex Machine Learning models with more data on distributed systems. Increasing the computation power speeds up the training but it faces a communication bottleneck between workers which hurts the scale-up of these distributed algorithms. Previous work tried to address this issue through quantization by broadcasting low precision updates (Seide et al., 2014; Alistarh et al., 2017) or through sparsification by sharing only the most important coordinates to update (Aji and Heafield, 2017; Lin et al., 2018). Even though the sparsifica- tion method works well in practice, it lacked a theoretical proof until now. We propose a sparsification scheme for SGD where only a small constant number of coordinates are applied at each iteration. The amount of data to be communicated is drastically reduced while the O(1/T ) convergence rate of vanilla SGD is preserved. Our concise proof extends to parallel setting and gains a linear speed up in the number of workers. We exper- iment with sparsified SGD with memory in C++ and Python, and show the excellent convergence properties for top-k and random-k operators. Our scheme outperforms QSGD in progress per number of bits sent. It also opens the path to using lock-free asynchronous parallelization (e.g. Hogwild! Niu et al. (2011)) on dense problems as the sparsity of the gradient updates is enforced.

  • Files
  • Details
  • Metrics
Type
master thesis
Author(s)
Cordonnier, Jean-Baptiste  
Advisors
Stich, Sebastian Urban  
•
Jaggi, Martin  
Date Issued

2018-06-27

Total of pages

49 pages

Subjects

Stochastic Gradient Descent

•

Convex optimization

•

Distributed optimization

Written at

EPFL

EPFL units
MLO  
Available on Infoscience
January 7, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/153353
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés