On the Complexity of Optimization: Curved Spaces and Benign Landscapes
This thesis is concerned with continuous optimization, especially problems which are not convex. In general, nonconvex optimization is hard. However, for many nonconvex problems arising in practice, local search algorithms (e.g., gradient descent) reliably and quickly find the global solution. This work aims to understand why, focusing on three core questions:
- What is the special structure about these problems that enables simple algorithms to succeed?
- How much computational resources (e.g., time or memory) do these algorithms use?
- What are the fundamental limits, beyond which no algorithm can reach?
In the first part of this thesis, we focus on a generalization of convexity to Riemannian manifolds: geodesic convexity (g-convexity). There are several optimization problems arising in statistics and theoretical computer science which are nonconvex, but are geodesically convex when an appropriately chosen metric is put on the domain. It is well-understood that algorithms (like Riemannian gradient descent) can solve g-convex problems to global optimality. Therefore, our focus is on the complexity of g-convex optimization (Questions 2 and 3). Our main result is a complexity lower bound demonstrating that the curvature of the optimization domain (a Riemannian manifold) necessarily slows down optimization.
Some nonconvex functions arising in practice can be reformulated as g-convex functions. However others cannot, and yet empirically algorithms solve these problems to global optimality. In the second part of this thesis, we analyze two such problems, with the aim of identifying the special structure which makes these problems tractable (Question 1). For this, we adopt the perspective of landscapes: a function has benign landscape if all local minima are global minima.
The first problem we consider is synchronization on circles and spheres with nonlinear interactions. We derive conditions under which the corresponding landscape is benign, which has implications for machine learning, control, and dynamical systems theory. The second problem we consider is sensor network localization, which has applications to data science and robotics. We show that in general its landscape is not benign (spurious local minima exist), but give a memory-efficient way (Question 2) to relax the problem so that the landscape becomes benign.
EPFL_TH11173.pdf
Main Document
Not Applicable (or Unknown)
openaccess
N/A
2.18 MB
Adobe PDF
c0f9e6291bfe25784728cd5ef6bc9b57