Covariance Mismatch in Diffusion Models
Diffusion models are usually trained using isotropic noise. Yet, common data distributions are strongly anisotropic. We explain that this covariance mismatch negatively impacts the model in several ways. It leads to the model predicting only the high-variance components at high noise levels and only the low-variance components at low noise levels, requiring a wide range of noise levels during training and inference to model all components accurately. This partition of components across noise levels also prevents smaller timesteps from correcting predictions of larger timesteps, and limits diffusion editing to only low-variance components. We show two approaches to realign the noise and data covariances: whitening the data distribution or coloring the noise distribution. We apply our approach on 2D point distributions and, using a Fourier-based approach, on images. Realigning covariances allows the model to focus more equally on all components, improving editing and enabling fewer noise levels in training. Models trained with realigned covariances offer greater flexibility in the choice of timesteps during inference and can even generate reasonable output while being trained on just a single timestep. Project page: https://ivrl.github.io/covariance-mismatch
Covariance Mismatch in Diffusion Models.pdf
main document
openaccess
N/A
44.74 MB
Adobe PDF
0c75962c4e31b95c3ee1624dae045db8
Covariance Mismatch in Diffusion Models (with Appendix).pdf
supplementary material/Information
openaccess
N/A
139.37 MB
Adobe PDF
ddd859a67da10bade71d3c9f4b20e3df