Abstract

Locality-Sensitive Hashing (LSH) approximates nearest neighbors in high dimensions by projecting original data into low-dimensional subspaces. The basic idea is to hash data samples to ensure that the probability of collision is much higher for samples that are close to each other than for those that are far apart. However, by applying k random hashing functions on original data, LSH fails to find the most discriminant hashing-subspaces, so the nearest neighbor approximation is inefficient. To alleviate this problem, we propose the Grassmann Hashing (GRASH) for approximate nearest neighbor search in high dimensions. GRASH frrst introduces a set of subspace candidates from Linear Discriminant Analysis (LDA). Then it applies Grassmann metric to select the optimal subspaces for hashing. Finally, it generates hashing codes based on non-uniform bucket size design motivated by Lloyd-Max quantization. The proposed GRASH model enjoys a number of merits: 1) GRASH introduces the Grassmann metric to measure the similarity between different hashing subspaces, so the hashing function can better capture the data diversity; 2) GRASH obtains the subspace candidates from LDA, so it incorporates the discriminant information into the hashing functions; 3) GRASH extends LSH's 1-d hashing subspaces to m-d, i.e. it is a multidimensional extension of hashing approximation; 4) motivated by Lloyd-Max quantization, GRASH applies non-uniform size bucket to generate hashing codes, so the distortion can be minimized. Experimental results on a number of datasets confirm the validity of our proposed model.

Details

Actions