Random matrix methods for high-dimensional machine learning models
In the rapidly evolving landscape of machine learning research, neural networks stand out with their ever-expanding number of parameters and reliance on increasingly large datasets. The financial cost and computational resources required for the training phase have sparked debates and raised concerns regarding the environmental impact of this process. As a result, it has become paramount to construct a theoretical framework that can provide deeper insights into how model performance scales with the size of the data, number of parameters, and training epochs.
This thesis is concerned with the analysis of such large machine learning models through a theoretical lens. The sheer sizes considered in these models make them suitable for the application of statistical methods in the limit of high dimensions, akin to the thermodynamic limit in the context of statistical physics.
Our approach is based on different results from random matrix theory, which involves large matrices with random entries. We will make a deep dive into this field and use a spectrum of tools and techniques that will underpin our investigations of these models across various settings.
Throughout our journey, we begin by constructing a model starting from a linear regression. We then extend and build upon it to allow for a wider range of architectures, culminating in a model that closely resembles the structure of a multi-layer neural network.
With the gradient-flow dynamics, we further develop analytical formulas predicting the learning curves of both the training and generalization errors. The equations derived in the process reveal several underlying phenomena emerging from the dynamics such as the double descent, and specific descent structures over time.
We then take a detour to explore the dynamics of the rank-one matrix estimation problem, commonly referred to as the Spike-Wigner model. This model is particularly intriguing due to the presence of a phase transition with respect to the signal-to-noise ratio, as well as challenges related to the non-convexity of the loss function and non-linear learning equations. Subsequently, we address the extensive-rank matrix denoising problem which is an extension of the previous model. It holds particular interest in the context of sample covariance matrix estimation, and presents other challenges stemming from the initialization and the tracking of eigenvectors alignment.
EPFL_TH10524.pdf
n/a
openaccess
copyright
4.42 MB
Adobe PDF
a00161b0637efbd831d4226d17854900