Masked Training of Neural Networks with Partial Gradients

Stich, Sebastian U.

conference paper

Mohtashami, Amirkeivan

•

Jaggi, Martin

•

Stich, Sebastian U.

January 1, 2022

International Conference On Artificial Intelligence And Statistics, Vol 151

International Conference on Artificial Intelligence and Statistics

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extra-gradient), limiting SGD updates to a subset of parameters for increased efficiency (such as meProp) or a combination of both (such as Dropout). However, the convergence of these methods is often not studied in theory.

We propose a unified theoretical framework to study such SGD variants-encompassing the aforementioned algorithms and additionally a broad variety of methods used for communication efficient training or model compression. Our insights can be used as a guide to improve the efficiency of such methods and facilitate generalization to new applications. As an example, we tackle the task of jointly training networks, a version of which (limited to sub-networks) is used to create Slimmable Networks. By training a low-rank Transformer jointly with a standard one we obtain superior performance than when it is trained separately.

Type

conference paper

Web of Science ID

WOS:000841852300012

Author(s)

Mohtashami, Amirkeivan

Jaggi, Martin

Stich, Sebastian U.

Date Issued

2022-01-01

Publisher

JMLR-JOURNAL MACHINE LEARNING RESEARCH

Publisher place

San Diego

Published in

International Conference On Artificial Intelligence And Statistics, Vol 151

Series title/Series vol.

Proceedings of Machine Learning Research

Volume

151

Start page

5876

End page

5890

Subjects

Computer Science, Artificial Intelligence

•

Statistics & Probability

•

Computer Science

•

Mathematics

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

MLO

Event name	Event place	Event date
International Conference on Artificial Intelligence and Statistics	ELECTR NETWORK	Mar 28-30, 2022

Available on Infoscience

November 7, 2022

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/191911