Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds
Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC’96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a (k, ϵ)-clusterable graph G is a graph whose vertex set admits a partition into k induced expanders, each with outer conductance bounded by ϵ. A recent line of work initiated by Czumaj, Peng and Sohler [STOC’15] has shown how to test whether a graph is close to (k, ϵ)clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate ≈ ϵ, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the “cluster graph” forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of (k, ϵ)clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a (k, ϵ)clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an O(√log k) approximation to Dasgupta cost of G in ≈ n1/2+O(ϵ) time using ≈ n1/3 seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA’17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.
LIPIcs.ICALP.2025.103.pdf
Main Document
Published version
openaccess
CC BY
935.53 KB
Adobe PDF
830d9cb5cc2cf93425d77fc83edca69c