Abstract

Conventional linear subspace learning methods like principal component analysis (PCA), linear discriminant analysis (LDA) derive subspaces from the whole data set. These approaches have limitations in the sense that they are linear while the data distribution we are trying to model is typically nonlinear. Moreover, these algorithms fail to incorporate local variations of the intrinsic sample distribution manifold. Therefore, these algorithms are ineffective when applied on large scale datasets. Kernel versions of these approaches can alleviate the problem to certain degree but face a serious computational challenge when data set is large, where the computing involves Eigen/QP problems of size N x N. When N is large, kernel versions are not computationally practical. To tackle the aforementioned problems and improve recognition/searching performance, especially on large scale image datasets, we propose a novel local subspace indexing model for image search termed Subspace Indexing Model on Grassmann Manifold (SIM-GM). SIM-GM partitions the global space into local patches with a hierarchical structure; the global model is, therefore, approximated by piece-wise linear local subspace models. By further applying the Grassmann manifold distance, SIM-GM is able to organize localized models into a hierarchy of indexed structure, and allow fast query selection of the optimal ones for classification. Our proposed SIM-GM enjoys a number of merits: 1) it is able to deal with a large number of training samples efficiently; 2) it is a query-driven approach, i.e., it is able to return an effective local space model, so the recognition performance could be significantly improved; 3) it is a common framework, which can incorporate many learning algorithms. Theoretical analysis and extensive experimental results confirm the validity of this model.

Details