Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Scaling description of generalization with number of parameters in deep learning
 
research article

Scaling description of generalization with number of parameters in deep learning

Geiger, Mario  
•
Jacot, Arthur  
•
Spigler, Stefano  
Show more
February 1, 2020
Journal Of Statistical Mechanics-Theory And Experiment

Supervised deep learning involves the training of neural networks with a large number N of parameters. For large enough N, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsitybased arguments would suggest that the generalization error increases as N grows past a certain threshold N*. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with N. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations parallel to f(N) - < f(N)>parallel to similar to N-1/4 of the neural net output function f(N) around its expectation < f(N)>. These affect the generalization error epsilon(f(N)) for classification: under natural assumptions, it decays to a plateau value epsilon(f(infinity)) in a power-law fashion similar to N-1/2. This description breaks down at a so-called jamming transition N = N*. At this threshold, we argue that parallel to f(N)parallel to diverges. This result leads to a plausible explanation for the cusp in test error known to occur at N*. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond N*, and averaging their outputs.

  • Details
  • Metrics
Type
research article
DOI
10.1088/1742-5468/ab633c
Web of Science ID

WOS:000523277900001

Author(s)
Geiger, Mario  
Jacot, Arthur  
Spigler, Stefano  
Gabriel, Franck  
Sagun, Levent  
d'Ascoli, Stephane
Biroli, Giulio
Hongler, Clement  
Wyart, Matthieu  
Date Issued

2020-02-01

Publisher

IOP PUBLISHING LTD

Published in
Journal Of Statistical Mechanics-Theory And Experiment
Volume

2020

Issue

2

Article Number

023401

Subjects

Mechanics

•

Physics, Mathematical

•

Physics

•

learning theory

•

machine learning

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
PCSL  
CSFT  
Available on Infoscience
April 17, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/168223
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés