Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Algorithms For Clustering Problems : Theoretical Guarantees and Empirical Evaluations
 
doctoral thesis

Algorithms For Clustering Problems : Theoretical Guarantees and Empirical Evaluations

Norouzi Fard, Ashkan  
2018

Clustering is a classic topic in combinatorial optimization and plays a central role in many areas, including data science and machine learning. In this thesis, we first focus on the dynamic facility location problem (i.e., the facility location problem in evolving metrics). We present a new LP-rounding algorithm for facility location problems, which yields the first constant factor approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on clients and facilities, and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use \emph{no clustering} and we enable clients to connect through paths of \emph{arbitrary lengths}. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.

Furthermore, we present both empirical and theoretical aspects of the $k$-means problem. The best known algorithm for $k$-means with a provable guarantee is a simple local-search heuristic that yields an approximation guarantee of $9+\epsilon$, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that enables us (1) to exploit the geometric structure of $k$-means and (2) to satisfy the hard constraint that at most $k$ clusters are selected without deteriorating the approximation guarantee. Our main result is a $6.357$-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of $k$-means where the underlying metric is not required to be Euclidean and for $k$-median in Euclidean metrics.

We also improve the running time of our algorithm to almost linear running time and still maintain a provable guarantee. We compare our algorithm with {\sc K-Means++} (a widely studied algorithm) and show that we obtain better accuracy with comparable and even better running time.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-8546
Author(s)
Norouzi Fard, Ashkan  
Advisors
Svensson, Ola Nils Anders  
Jury

Prof. Friedrich Eisenbrand (président) ; Prof. Ola Nils Anders Svensson (directeur de thèse) ; Prof. Michael Kapralov, Prof. David Shmoys, Prof. claire Mathieu (rapporteurs)

Date Issued

2018

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2018-07-13

Thesis number

8546

Total of pages

152

Subjects

Linear Programming

•

Approximation Algorithms

•

Clustering

•

Facility Location Problem

•

k-Median

•

k-Means

EPFL units
THL2  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Award

EDIC Patrick Denantes Memorial Prize

2018
Available on Infoscience
July 10, 2018
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/147179
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés