Stable Optimization in Deep Learning: Geometry and Games
Training instabilities are ubiquitous in deep learning and have led to many heuristics aimed at stabilizing the learning process. From an optimization standpoint, such instabilities often arise when the local model used by the algorithm fails to faithfully capture the underlying structure of the problem. This thesis takes a principled approach to address this issue by designing optimization methods that better align with the geometry and game-theoretic nature of the problems arising in deep learning.
In Part I, we focus on minimization problems and propose a family of geometry-aware algorithms that adapt to the structure of neural networks. By explicitly incorporating norm constraints and scale-invariant updates, these methods allow for stable and efficient training of large models with large batch sizes, all without incurring additional memory overhead.
In Part II, we move on to the significantly more challenging problem of multi-agent games. These settings are known to exhibit unstable dynamics, such as limit cycles and divergence. We show that such pathologies can be avoided through simple, local update rules, even when classical assumptions like monotonicity fail. We introduce new algorithmic frameworks based on extragradient and proximal methods.
In both parts, we address both the deterministic and stochastic settings. In the stochastic case, we incorporate momentum-based gradient estimators that reduce variance without requiring increasing batch sizes. This plays a central role in ensuring convergence of the proposed methods both in theory and practice.
EPFL_TH10377.pdf
Main Document
Published version
openaccess
N/A
8.58 MB
Adobe PDF
650fe7f2063d5b64f38959c119f578f2