Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models
 
conference paper

Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models

Dreveton, Maximilien  
•
Gözeten, Alperen
•
Grossglauser, Matthias  
Show more
July 3, 2024
Proceedings of Thirty Seventh Conference on Learning Theory
The 37th Annual Conference on Learning Theory

Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through Chernoff information, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.

  • Files
  • Details
  • Metrics
Type
conference paper
ArXiv ID

2402.15432

Author(s)
Dreveton, Maximilien  

EPFL

Gözeten, Alperen
Grossglauser, Matthias  

EPFL

Thiran, Patrick  

EPFL

Date Issued

2024-07-03

Published in
Proceedings of Thirty Seventh Conference on Learning Theory
Series title/Series vol.

Proceedings of Machine Learning Research; 247

Start page

1451

End page

1485

Subjects

clustering

•

mixture models

•

k-means

•

iterative algorithms

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
INDY1  
INDY2  
Event nameEvent acronymEvent placeEvent date
The 37th Annual Conference on Learning Theory

COLT 2024

Edmonton, Canada

2024-06-30 - 2024-07-03

Available on Infoscience
September 23, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/241352
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés