Conference paper

A Bi-Criteria Approximation Algorithm for k-Means

We consider the classical k-means clustering problem in the setting of bi-criteria approximation, in which an algorithm is allowed to output beta*k > k clusters, and must produce a clustering with cost at most alpha times the to the cost of the optimal set of k clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee alpha(beta) depending on the number beta*k of clusters that may be opened. Our guarantee alpha(beta) is always at most 9 + epsilon and improves rapidly with beta (for example: alpha(2) < 2.59, and alpha(3) < 1.4). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings.


Related material