Deep Learning Theory Through the Lens of Diagonal Linear Networks
In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of $2$-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal inner weight matrix, has the advantage of revealing interesting training characteristics while keeping the theoretical analysis clean and insightful.
The manuscript is composed of four parts. The first serves as a general introduction to the depicted architecture, it provides results on the optimisation trajectory of gradient flow, upon which the rest of the manuscript is built. The second part focuses on saddle-to-saddle dynamics. Taking the initialisation scale of the gradient flow to zero, we prove and describe the existence of an asymptotic learning trajectory where coordinates are learnt incrementally. In the third part we focus on the effect of various hyperparameters (namely the batch-size, the stepsize and the momentum parameter) on the solution which is recovered by the corresponding gradient method. The fourth and last part takes a slightly different point of view. An underlying mirror-descent structure emerges when analysing gradient descent on diagonal linear networks and slightly more complex architectures. This consequently encourages a deeper understanding of mirror-descent trajectories. In this context, we prove the convergence of the mirror flow in the linear classification setting towards a maximum margin separating hyperplane.
EPFL_TH10589.pdf
n/a
openaccess
copyright
15.4 MB
Adobe PDF
958170b03de365de7a04f244598c5135