Model Non-Collapse: Minimax Bounds for Recursive Discrete Distribution Estimation
Learning discrete distributions from i.i.d. samples is a well-understood problem. However, advances in generative machine learning prompt an interesting new, non-i.i.d. setting: after receiving a certain number of samples, an estimated distribution is fixed, and samples from this estimate are drawn and introduced into the sample corpus, undifferentiated from real samples. Subsequent generations of estimators now face contaminated environments, a scenario referred to in the machine learning literature as self-consumption. Empirically, it has been observed that models in fully synthetic self-consuming loops collapse-their performance deteriorates with each batch of training-but accumulating data has been shown to prevent complete degeneration. This, in turn, begs the question: What happens when fresh real samples are added at every stage? In this paper, we study the minimax loss of self-consuming discrete distribution estimation in such loops. We show that even when model collapse is consciously averted, the ratios between the minimax losses with and without source information can grow unbounded as the batch size increases. In the data accumulation setting, where all batches of samples are available for estimation, we provide minimax lower bounds and upper bounds that are order-optimal under mild conditions for the expected ℓ22 and ℓ1 losses at every stage. We provide conditions and examples for regimes where there is a strict gap in the convergence rates compared to the corresponding oracle-assisted minimax loss where real and synthetic samples are differentiated. We also provide a lower bound on the minimax loss in the data replacement setting, where only the latest batch of samples is available, and use it to find a lower bound for the worst-case loss for bounded estimate trajectories.
École Polytechnique Fédérale de Lausanne
École Polytechnique Fédérale de Lausanne
2025-06-22
979-8-3315-4399-0
1
6
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
ISIT 2025 | Ann Arbor, MI, USA | 2025-06-22 - 2025-06-27 | |