Cluster-and-Conquer: When Randomness Meets Graph Locality

Giakkoupis, George; Kermarrec, Anne-Marie; Ruas, Olivier; Taiani, Francois

doi:10.1109/ICDE51399.2021.00195

conference paper

Cluster-and-Conquer: When Randomness Meets Graph Locality

Giakkoupis, George

•

Kermarrec, Anne-Marie

•

Ruas, Olivier

more

January 1, 2021

2021 Ieee 37Th International Conference On Data Engineering (Icde 2021)

37th IEEE International Conference on Data Engineering (IEEE ICDE)

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Unfortunately, the initial random graph exhibits a poor graph locality, leading to many unnecessary similarity computations. In this paper, we remove this drawback with Cluster-and-Conquer (C-2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages randomness and recursion to precluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speedups of up to x4.42 and even improving the KNN quality.

Type

conference paper

DOI

10.1109/ICDE51399.2021.00195

Web of Science ID

WOS:000687830800187

Authors

Giakkoupis, George

•

Kermarrec, Anne-Marie

•

Ruas, Olivier

•

Taiani, Francois

Publication date

2021-01-01

Publisher

IEEE COMPUTER SOC

Published in

2021 Ieee 37Th International Conference On Data Engineering (Icde 2021)

ISBN of the book

978-1-7281-9184-3

Publisher place

Los Alamitos

Series title/Series vol.

IEEE International Conference on Data Engineering

Start page

2027

End page

2032

Subjects

Computer Science, Inf...

Computer Science, The...

Computer Science

knn graph

big data

Peer reviewed

REVIEWED

EPFL units

SACS

Event name	Event place	Event date
37th IEEE International Conference on Data Engineering (IEEE ICDE)	ELECTR NETWORK	Apr 19-22, 2021

Available on Infoscience

September 25, 2021

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/181713