Topics in statistical physics of high-dimensional machine learning

Cui, Hugo Chao

doi:10.5075/epfl-thesis-10948

doctoral thesis

Topics in statistical physics of high-dimensional machine learning

2024

In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in the use of artificial neural networks to learn informative and actionable representations of data, from data. While the number of empirical successes accrues, a solid theoretical comprehension of the unreasonable effectiveness of ML methods in learning from high-dimensional data still proves largely elusive. This is the question addressed in this thesis, through the study of solvable models in high dimensions, satisfying the dual requirement of (a) capturing the key features of practical ML tasks while (b) remaining amenable to mathematical analysis. Borrowing ideas from statistical physics, this thesis presents sharp asymptotic incursions into a selection of central aspects of modern ML.

The remarkable versatility of ML models lies in their ability to extract informative features from data. The first part of the thesis delves into analyzing which structural characteristics of these features condition the learning of ML methods. Specifically, it highlights how, in several settings, a theory formulated in terms of two statistical descriptors can tightly capture the learning curves of simple real tasks. For kernel methods in particular, this insight enables one to relate the error scaling laws to the structure of the features.

The second part then refines the focus to study which features are extracted in multi-layer neural networks, both (a) when untrained and (b) when trained in the framework of Bayesian learning, or after one large gradient step. In particular, it delineates cases in which Gaussian universality holds and limits the network expressivity, and cases in which neural networks succeed in learning non-trivial features.

Finally, supervised learning tasks with fully-connected architectures constitute but a small part of the zoology of modern ML tasks. The last part of the thesis opens up the sharp asymptotic explorations to some modern aspects of the discipline, in particular transport-based generative models, and dot-product attention mechanisms.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-10948

Author(s)

Cui, Hugo Chao

Advisors

Zdeborová, Lenka

Jury

Prof. Laurent Villard (président) ; Prof. Lenka Zdeborová (directeur de thèse) ; Prof. Nicolas Flammarion, Prof. Giulio Biroli, Prof. Joan Bruna (rapporteurs)

Date Issued

2024

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2024-06-24

Thesis number

10948

Total of pages

284

Subjects

Machine Learning

•

Statistical Physics

•

High-dimensional asymptotics

•

Deep Neural Networks

•

Random Features

•

Gaussian Universality

•

Kernels

•

Attention mechanisms

•

Generative models

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

June 19, 2024

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/208803