Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Deep Neural Networks: Large-Width Behavior and Generalization Bounds
 
doctoral thesis

Deep Neural Networks: Large-Width Behavior and Generalization Bounds

Golikov, Evgenii  
2025

During several decades, neural network architectures have undergone considerable evolution: from shallow to deep, from fully connected to structured (e.g. convolutional, recurrent, or residual), from unnormalized to normalized. The present thesis contributes to Deep Learning Theory: a field of science studying the behavior of neural networks.

An essential part of the present work is devoted to Tensor Programs first introduced by Yang (2019a), a unifying formalism covering both training and inference for a vast class of neural network architectures. The theory of Tensor Programs is tightly related to the theory of infinitely wide networks, which we exhaustively overview in one of the chapters. In a direction orthogonal to Tensor Programs, we also present results on exact minima of L2-regularized objective functions. Finally, we state a novel a-priori generalization bound for neural networks with activation functions close to being linear.

The cornerstone of the theory of Tensor Programs is the Master Theorem. This theorem states that any scalar generated by a Tensor Program (i.e. a loss or accuracy value at a given training timestep) converges to a deterministic limit given by a certain recurrence formula. This theorem generalizes several existing results in the theory of infinitely wide neural nets (Yang, 2019b, 2020a; Yang and Littwin, 2021), and recovers classical results in random matrix theory (Yang, 2020b).

In the present work, we generalize this result to non-Gaussian weight initializations, thus proving that the Master Theorem is universal with respect to the initial weight distribution. We also provide tail bounds for scalars generated by a Tensor Program; these tail bounds are crucial for analyzing convergence rates.

In addition, we provide a comprehensive survey on the theory of networks in a specific limit of infinite width (namely, in the NTK limit).

Apart from infinitely wide nets, this thesis also analyzes the minima of L2-regularized loss functions for networks of finite width. One of the results we obtained is that the regularized loss value cannot be improved by adding new neurons after a certain finite threshold value which depends on the training dataset size.

Finally, we prove a novel generalization bound for networks with activation functions close to being linear. To the best of our knowledge, this bound is the first to be non-vacuous (i.e. it guarantees the distribution risk to be below that of a random guess model) and a-priori (i.e. it can be computed before the actual training of the model) at the same time.

We hope our work (1) demonstrates that neural networks exhibit universality phenomena ubiquitous in natural sciences, (2) helps to understand the behavior of infinitely wide networks better, (3) connects them to networks of finite width, (4) suggests when making a network wider does not improve performance, and (5) provides better generalization guarantees for neural networks.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH11255.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

N/A

Size

5.48 MB

Format

Adobe PDF

Checksum (MD5)

b50d8cc5a6839281d8a7914430c0f01d

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés