Augmented Lagrangian Methods for Provable and Scalable Machine Learning

Sahin, Mehmet Fatih

doi:10.5075/epfl-thesis-9613

doctoral thesis

Augmented Lagrangian Methods for Provable and Scalable Machine Learning

2023

Non-convex constrained optimization problems have become a powerful framework for modeling a wide range of machine learning problems, with applications in k-means clustering, large- scale semidefinite programs (SDPs), and various other tasks. As the performance of these methods on real-world tasks has shown promising results, the need to develop efficient algorithms and understand their theoretical properties has become increasingly important. Augmented Lagrangian methods provide strong tools for solving constrained problems by breaking them down into a sequence of relatively easier unconstrained subproblems. Theoretical properties of these methods have been extensively studied for problems where both the objective function and constraints are convex. Additionally, there have been efforts to provide convergence guarantees to the first-order stationary points of constrained problems when only the objective function is non-convex. However, the scenario in which the objective function and constraints are both non-convex has yet to be sufficiently explored. This thesis is dedicated to the development and theoretical analysis of algorithms that are grounded in the Augmented Lagrangian method, with an emphasis on efficiency and practicality. First, we introduce a generic inexact Augmented Lagrangian framework that enables the use of either first-order or second-order subsolvers in the subproblems to attain convergence guarantees for the first (or second) order stationary points of the original constrained problem. This is accomplished with an algorithm that can be implemented practically. The success of the algorithm relies on a verifiable geometric regularity condition. We showcase the effectiveness of the algorithm via a range of numerical examples, including k-means clustering and l8 image denoising problem that incorporates a generative adversarial network (GAN) prior to counteract adversarial examples. Next, we direct our attention to a more specialized algorithm aimed at solving semidefinite programming (SDP) problems by factorizing the decision variable. The resulting optimization problems are inherently non-convex and nonlinear. We obtain the rate of convergence with an augmented Lagrangian method which combines aspects of both linearized and inexact augmented Lagrangian methods. Finally, we present a linearized alternating direction method of multipliers (ADMM) framework that is more practical and requires only two consecutive proximal gradient steps on the primal variables and a gradient ascent step in the dual. This framework is designed to address the increasingly important problem of minimizing two sets of variables with a separable non-convex composite objective that has nonlinear constraints. Our analysis enables us to recover known convergence rates with a single loop algorithm, which is less complex than the inexact Augmented Lagrangian variants. Furthermore, our framework accommodates the linearized augmented Lagrangian as a special case. The numerical evidence on various machine learning tasks, including causal learning, clustering and maximum cut problems, illustrates that the proposed algorithm is versatile, scalable and accurate while requiring minimal tuning.

Name

EPFL_TH9613.pdf

Type

N/a

Access type

openaccess

License Condition

copyright

Size

2.97 MB

Format

Adobe PDF

Checksum (MD5)

c8dd84c07444c49799832b1c1ef6a675