Abstract

Principal component analysis (PCA) finds the best linear representation of data and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy," an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky theorem. This paper introduces many other ways of performing PCA, with various geometric interpretations, and proves that the corresponding family of nonconvex programs has no spurious local optima, while possessing only strict saddle points. These programs therefore loosely behave like convex problems and can be efficiently solved to global optimality, for example, with certain variants of the stochastic gradient descent. Beyond providing new geometric interpretations and enhancing our theoretical understanding of PCA, our findings might pave the way for entirely new approaches to structured dimensionality reduction, such as sparse PCA and nonnegative matrix factorization. More specifically, we study an unconstrained formulation of PCA using determinant optimization that might provide an elegant alternative to the deflating scheme commonly used in sparse PCA.

Details