Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Kernel Density Estimation through Density Constrained Near Neighbor Search
 
conference paper

Kernel Density Estimation through Density Constrained Near Neighbor Search

Charikar, Moses
•
Kapralov, Michael
•
Nouri, Navid  
Show more
January 1, 2020
2020 Ieee 61St Annual Symposium On Foundations Of Computer Science (Focs 2020)
61st IEEE Annual Symposium on Foundations of Computer Science (FOCS)

In this paper we revisit the kernel density estimation problem: given a kernel K(x, y) and a dataset of n points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query q, a (1 + epsilon)-approximation to mu := 1/vertical bar P vertical bar Sigma p is an element of P K(p, q). First, we give a single data structure based on classical near neighbor search techniques that improves upon or essentially matches the query time and space complexity for all radial kernels considered in the literature so far. We then show how to improve both the query complexity and runtime by using recent advances in data-dependent near neighbor search.

We achieve our results by giving an new implementation of the natural importance sampling scheme. Unlike previous approaches, our algorithm first samples the dataset uniformly (considering a geometric sequence of sampling rates), and then uses existing approximate near neighbor search techniques on the resulting smaller dataset to retrieve the sampled points that lie at an appropriate distance from the query. We show that the resulting sampled dataset has strong geometric structure, making approximate near neighbor search return the required samples much more efficiently than for worst case datasets of the same size. As an example application, we show that this approach yields a data structure that achieves query time mu-((1+o(1))/4) and space complexity mu-((1+o(1))) for the Gaussian kernel. Our data dependent approach achieves query time mu(-0.173-o(1)) and space mu-(1+o(1)) for the Gaussian kernel. The data dependent analysis relies on new techniques for tracking the geometric structure of the input datasets in a recursive hashing process that we hope will be of interest in other applications in near neighbor search.

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/FOCS46700.2020.00025
Web of Science ID

WOS:000652333400017

Author(s)
Charikar, Moses
Kapralov, Michael
Nouri, Navid  
Siminelakis, Paris
Date Issued

2020-01-01

Publisher

IEEE

Publisher place

New York

Published in
2020 Ieee 61St Annual Symposium On Foundations Of Computer Science (Focs 2020)
ISBN of the book

978-1-7281-9621-3

Series title/Series vol.

Annual IEEE Symposium on Foundations of Computer Science

Start page

172

End page

183

Subjects

Computer Science, Theory & Methods

•

Computer Science

•

kernel density estimation

•

locality sensitive hashing

•

near neighbor search

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
THL4  
Event nameEvent placeEvent date
61st IEEE Annual Symposium on Foundations of Computer Science (FOCS)

ELECTR NETWORK

Nov 16-19, 2020

Available on Infoscience
July 3, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/179669
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés