Deep Learning Theory Through the Lens of Diagonal Linear Networks

Flammarion, Nicolas Henri BernardPesme, Scott William2024-06-042024-06-042024-06-04202410.5075/epfl-thesis-10589https://infoscience.epfl.ch/handle/20.500.14299/208225In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of $2$-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal inner weight matrix, has the advantage of revealing interesting training characteristics while keeping the theoretical analysis clean and insightful. The manuscript is composed of four parts. The first serves as a general introduction to the depicted architecture, it provides results on the optimisation trajectory of gradient flow, upon which the rest of the manuscript is built. The second part focuses on saddle-to-saddle dynamics. Taking the initialisation scale of the gradient flow to zero, we prove and describe the existence of an asymptotic learning trajectory where coordinates are learnt incrementally. In the third part we focus on the effect of various hyperparameters (namely the batch-size, the stepsize and the momentum parameter) on the solution which is recovered by the corresponding gradient method. The fourth and last part takes a slightly different point of view. An underlying mirror-descent structure emerges when analysing gradient descent on diagonal linear networks and slightly more complex architectures. This consequently encourages a deeper understanding of mirror-descent trajectories. In this context, we prove the convergence of the mirror flow in the linear classification setting towards a maximum margin separating hyperplane.entheory of deep learningdiagonal linear networksimplicit regularisationnon-convex optimisationmirror descentDeep Learning Theory Through the Lens of Diagonal Linear Networksthesis::doctoral thesis