Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds
 
conference paper

Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds

Kapralov, Michael  
•
Kumar, Akash
•
Lattanzi, Silvio
Show more
Censor-Hillel, Keren
•
Grandoni, Fabrizio
Show more
June 30, 2025
52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)
52nd International Colloquium on Automata, Languages, and Programming

Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC’96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a (k, ϵ)-clusterable graph G is a graph whose vertex set admits a partition into k induced expanders, each with outer conductance bounded by ϵ. A recent line of work initiated by Czumaj, Peng and Sohler [STOC’15] has shown how to test whether a graph is close to (k, ϵ)clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate ≈ ϵ, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the “cluster graph” forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of (k, ϵ)clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a (k, ϵ)clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an O(√log k) approximation to Dasgupta cost of G in ≈ n1/2+O(ϵ) time using ≈ n1/3 seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA’17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

LIPIcs.ICALP.2025.103.pdf

Type

Main Document

Version

Published version

Access type

openaccess

License Condition

CC BY

Size

935.53 KB

Format

Adobe PDF

Checksum (MD5)

830d9cb5cc2cf93425d77fc83edca69c

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés