Guerraoui, RachidKermarrec, Anne-MarieNiot, GuilhemRuas, OlivierTaiani, Francois2024-02-192024-02-192024-02-192023-11-0110.1109/TKDE.2022.3232689https://infoscience.epfl.ch/handle/20.500.14299/204055WOS:001089176900037We propose GoldFinger, a new compact and fast-to-compute binary representation of datasets to approximate Jaccard's index. We illustrate the effectiveness of GoldFinger on the emblematic big data problem of K-Nearest-Neighbor (KNN) graph construction and show that GoldFinger can drastically accelerate a large range of existing KNN algorithms with little to no overhead. As a side effect, we also show that the compact representation of the data protects users' privacy for free by providing k-anonymity and l-diversity. Our extensive evaluation of the resulting approach on several realistic datasets shows that our approach reduces computation times by up to 78.9% compared to raw data while only incurring a negligible to moderate loss in terms of KNN quality. We also show that GoldFinger can be applied to KNN queries (a widely-used search technique) and delivers speedups of up to x3.55 over one of the most efficient approaches to this problem.TechnologyKnn GraphsFingerprintSimilarityGoldFinger: Fast & Approximate Jaccard for Efficient KNN Graph Constructionstext::journal::journal article::research article