Conference paper

PCA using graph total variation

Mining useful clusters from high dimensional data has received sig- nificant attention of the signal processing and machine learning com- munity in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of di- mensionality. However, often such methods are accompanied with problems such as high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix fac- torization methods) or susceptibility to gross corruptions in the data. In this paper we propose a convex, robust, scalable and efficient Prin- cipal Component Analysis (PCA) based method to approximate the low-rank representation of high dimensional datasets via a two-way graph regularization scheme. Compared to the exact recovery meth- ods, our method is approximate, in that it enforces a piecewise con- stant assumption on the samples using a graph total variation and a piecewise smoothness assumption on the features using a graph Tikhonov regularization. Futhermore, it retrieves the low-rank rep- resentation in a time that is linear in the number of data samples. Clustering experiments on 3 benchmark datasets with different types of corruptions show that our proposed model outperforms 7 state-of- the-art dimensionality reduction models.

Related material