HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Arora, Akhil; Sinha, Sakshi; Kumar, Piyush; Bhattacharya, Arnab

doi:10.14778/3204028.3204034

research article

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Arora, Akhil

•

Sinha, Sakshi

•

Kumar, Piyush

more

April 1, 2018

Proceedings Of The Vldb Endowment

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.

Type

research article

DOI

10.14778/3204028.3204034

Web of Science ID

WOS:000452530000007

Authors

Arora, Akhil

•

Sinha, Sakshi

•

Kumar, Piyush

•

Bhattacharya, Arnab

Publication date

2018-04-01

Publisher

ASSOC COMPUTING MACHINERY

Published in

Proceedings Of The Vldb Endowment

Volume

11

Issue

8

Start page

906

End page

919

Subjects

Computer Science, Inf...

Computer Science

similarity search

product quantization

access methods

nearest

algorithm

performance

Peer reviewed

REVIEWED

EPFL units

DLAB

Available on Infoscience

December 20, 2018

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/153119