Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Approximation algorithms for geometric dispersion
 
doctoral thesis

Approximation algorithms for geometric dispersion

Cevallos Manzano, Alfonso Bolívar  
2016

The most basic form of the max-sum dispersion problem (MSD) is as follows: given n points in R^q and an integer k, select a set of k points such that the sum of the pairwise distances within the set is maximal. This is a prominent diversity problem, with wide applications in web search and information retrieval, where one needs to find a small and diverse representative subset of a large dataset. The problem has recently received a great deal of attention in the computational geometry and operations research communities; and since it is NP-hard, research has focused on efficient heuristics and approximation algorithms. Several classes of distance functions have been considered in the literature. Many of the most common distances used in applications are induced by a norm in a real vector space. The focus of this thesis is on MSD over these geometric instances. We provide for it simple and fast polynomial-time approximation schemes (PTASs), as well as improved constant-factor approximation algorithms. We pay special attention to the class of negative-type distances, a class that includes Euclidean and Manhattan distances, among many others. In order to exploit the properties of this class, we apply several techniques and results from the theory of isometric embeddings. We explore the following variations of the MSD problem: matroid and matroid-intersection constraints, knapsack constraints, and the mixed-objective problem that maximizes a combination of the sum of pairwise distances with a submodular monotone function. In addition to approximation algorithms, we present a core-set for geometric instances of low dimension, and we discuss the efficient implementation of some of our algorithms for massive datasets, using the streaming and distributed models of computation.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH7291.pdf

Access type

openaccess

Size

1.2 MB

Format

Adobe PDF

Checksum (MD5)

d7b1ce7a6b7b61401bc9cebde93b96c8

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés