Algorithms For Clustering Problems: Theoretical Guarantees and Empirical Evaluations

Norouzi Fard, Ashkan

doi:10.5075/epfl-thesis-8546

Norouzi Fard, Ashkan

2018

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Clustering is a classic topic in combinatorial optimization and plays a central role in many areas, including data science and machine learning. In this thesis, we first focus on the dynamic facility location problem (i.e., the facility location problem in evolving metrics). We present a new LP-rounding algorithm for facility location problems, which yields the first constant factor approximation algorithm for the dynamic facility location problem. Our algorithm installs competing exponential clocks on clients and facilities, and connects every client by the path that repeatedly follows the smallest clock in the neighborhood. The use of exponential clocks gives rise to several properties that distinguish our approach from previous LP-roundings for facility location problems. In particular, we use \emph{no clustering} and we enable clients to connect through paths of \emph{arbitrary lengths}. In fact, the clustering-free nature of our algorithm is crucial for applying our LP-rounding approach to the dynamic problem. Furthermore, we present both empirical and theoretical aspects of the $k$-means problem. The best known algorithm for $k$-means with a provable guarantee is a simple local-search heuristic that yields an approximation guarantee of $9+\epsilon$, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that enables us (1) to exploit the geometric structure of $k$-means and (2) to satisfy the hard constraint that at most $k$ clusters are selected without deteriorating the approximation guarantee. Our main result is a $6.357$-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of $k$-means where the underlying metric is not required to be Euclidean and for $k$-median in Euclidean metrics. We also improve the running time of our algorithm to almost linear running time and still maintain a provable guarantee. We compare our algorithm with {\sc K-Means++} (a widely studied algorithm) and show that we obtain better accuracy with comparable and even better running time.

Details

Title Algorithms For Clustering Problems: Theoretical Guarantees and Empirical Evaluations

Author(s) Norouzi Fard, Ashkan

Advisor(s)

Svensson, Ola Nils Anders

Pagination 152

Date 2018

Publisher Lausanne, EPFL

Keywords

Linear Programming; Approximation Algorithms; Clustering; Facility Location Problem; k-Median; k-Means

Language English

DOI https://doi.org/10.5075/epfl-thesis-8546

Laboratories THL2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > THL2 - Theory of Computation Laboratory 2
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2018-07-10

Files

Abstract

Details

PDF