Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models

Dreveton, Maximilien; Gözeten, Alperen; Grossglauser, Matthias; Thiran, Patrick

conference paper

Dreveton, Maximilien

•

Gözeten, Alperen

•

Grossglauser, Matthias

July 3, 2024

Proceedings of Thirty Seventh Conference on Learning Theory

The 37th Annual Conference on Learning Theory

Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through Chernoff information, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.

Type

conference paper

ArXiv ID

2402.15432

Author(s)

Dreveton, Maximilien

EPFL

Gözeten, Alperen

Grossglauser, Matthias

EPFL

Thiran, Patrick

EPFL

Date Issued

2024-07-03

Published in

Proceedings of Thirty Seventh Conference on Learning Theory

Series title/Series vol.

Proceedings of Machine Learning Research; 247

Start page

1451

End page

1485

Subjects

clustering

•

mixture models

•

k-means

•

iterative algorithms

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

INDY1

INDY2

Event name	Event acronym	Event place	Event date
The 37th Annual Conference on Learning Theory	COLT 2024	Edmonton, Canada	2024-06-30 - 2024-07-03

Available on Infoscience

September 23, 2024

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/241352