Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Geometric compression of invariant manifolds in neural networks
 
research article

Geometric compression of invariant manifolds in neural networks

Paccolat, Jonas  
•
Petrini, Leonardo  
•
Geiger, Mario  
Show more
April 1, 2021
Journal Of Statistical Mechanics-Theory And Experiment

We study how neural networks compress uninformative input space in models where data lie in d dimensions, but the labels of which only vary within a linear manifold of dimension d(parallel to) < d. We show that for a one-hidden-layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolves to become nearly insensitive to the d(perpendicular to) = d - d(parallel to) uninformative directions. These are effectively compressed by a factor lambda similar to p, where p is the size of the training set. We quantify the benefit of such a compression on the test error epsilon. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that epsilon similar to p(-beta), with beta(Lazy) = d/(3d - 2). Compression improves the learning curves so that beta(Feature) = (2d - 1)/(3d - 2) if d(parallel to) = 1 and beta(Feature) = (d + d(perpendicular to)/2)/(3d - 2) if d(parallel to) > 1. We test these predictions for a stripe model where boundaries are parallel interfaces (d(parallel to) = 1) as well as for a cylindrical boundary (d(parallel to) = 2). Next, we show that compression shapes the neural tangent kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden-layer fully connected network trained on the stripe model and for a 16-layer convolutional neural network trained on the Modified National Institute of Standards and Technology database (MNIST), for which we also find beta(Feature) > beta(Lazy). The great similarities found in these two cases support the idea that compression is central to the training of MNIST, and puts forward kernel principal component analysis on the evolving NTK as a useful diagnostic of compression in deep networks.

  • Details
  • Metrics
Type
research article
DOI
10.1088/1742-5468/abf1f3
Web of Science ID

WOS:000644136000001

Author(s)
Paccolat, Jonas  
Petrini, Leonardo  
Geiger, Mario  
Tyloo, Kevin
Wyart, Matthieu  
Date Issued

2021-04-01

Publisher

IOP PUBLISHING LTD

Published in
Journal Of Statistical Mechanics-Theory And Experiment
Volume

2021

Issue

4

Article Number

044001

Subjects

Mechanics

•

Physics, Mathematical

•

Mechanics

•

Physics

•

deep learning

•

learning theory

•

machine learning

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
PCSL  
Available on Infoscience
May 22, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/178159
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés