We propose a novel non-linear video diffusion approach which is able to focus on parts of a video sequence that are relevant for applications in audio-visual analysis. The diffusion process is controlled by a diffusion coefﬁcient based on an estimate of the synchrony between video motion and audio energy at each point of the video volume. Thus, regions whose motion is not coherent with the soundtrack are iteratively smoothed. The discretization of the proposed continuous diffusion formulation is carefully studied and its stability demonstrated. Our approach is tested in challenging situations involving sequence degradation and distracting video motion. Results show that in all cases our method is able to keep the focus of attention on the sound sources.