Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Algorithms For Clustering Problems : Theoretical Guarantees and Empirical Evaluations
 
doctoral thesis

Algorithms For Clustering Problems : Theoretical Guarantees and Empirical Evaluations

Norouzi Fard, Ashkan  
2018

Clustering is a classic topic in combinatorial optimization and plays a central role in many areas, including data science and machine learning. In this thesis, we first focus on the dynamic facility location problem (i.e., the facility location problem in evolving metrics). We present a new LP-rounding algorithm for facility location problems, which yields the first constant factor approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on clients and facilities, and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use \emph{no clustering} and we enable clients to connect through paths of \emph{arbitrary lengths}. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem.




Furthermore, we present both empirical and theoretical aspects of the $k$-means problem. The best known algorithm for $k$-means with a provable guarantee is a simple local-search heuristic that yields an approximation guarantee of $9+\epsilon$, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that enables us (1) to exploit the geometric structure of $k$-means and (2) to satisfy the hard constraint that at most $k$ clusters are selected without deteriorating the approximation guarantee. Our main result is a $6.357$-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of $k$-means where the underlying metric is not required to be Euclidean and for $k$-median in Euclidean metrics.


We also improve the running time of our algorithm to almost linear running time and still maintain a provable guarantee. We compare our algorithm with {\sc K-Means++} (a widely studied algorithm) and show that we obtain better accuracy with comparable and even better running time.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH8546.pdf

Access type

openaccess

Size

3.61 MB

Format

Adobe PDF

Checksum (MD5)

f5f163368fed0b257900db0f71e8b468

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés