Neumayer, Sebastian JonasChizat, LenaicUnser, Michael2024-03-182024-03-182024-03-182024-01-01https://infoscience.epfl.ch/handle/20.500.14299/206505WOS:001168593000001In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal -transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite-dimensional convex counterpart. We formulate the corresponding functional-optimization problem and investigate its main properties. In particular, we show that, as the scale of the initialization ranges between 0 and +infinity, the associated path interpolates continuously between the so-called kernel and rich regimes. Numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly, even beyond these extreme points.TechnologyGradient-Descent TrainingRegularization PathNeural Tangent KernelGamma- ConvergenceHellinger-Kantorovich DistanceOn the Effect of Initialization: The Scaling Path of 2-Layer Neural Networkstext::journal::journal article::research article