Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
 
conference paper not in proceedings

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Papazov, Hristo Georgiev  
•
Pesme, Scott  
•
Flammarion, Nicolas  
March 10, 2024
Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AIS- TATS) 2024,

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size $\gamma$ and momentum parameter $\beta$ that allows us to identify an intrinsic quantity $\lambda = \frac{ \gamma }{ (1 - \beta)^2 }$ which uniquely defines the optimisation path and provides a simple acceleration rule. When training a $2$-layer diagonal linear network in an overparametrised regression setting, we characterise the recovered solution through an implicit regularisation problem. We then prove that small values of $\lambda$ help to recover sparse solutions. Finally, we give similar but weaker results for stochastic momentum gradient descent. We provide numerical experiments which support our claims.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Implicit_Bias_Momentum-2.pdf

Type

Postprint

Version

Accepted version

Access type

openaccess

License Condition

CC BY

Size

5.2 MB

Format

Adobe PDF

Checksum (MD5)

4549a44e5737c20f3cb9fe9e79a8b5ad

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés