In this article, we introduce a novel approach for monaural source separation with the specific aim to separate a polyphonic musical recording into two main sources: a main instrument (or melody) track and an accompaniment track. To that aim, we propose to model the power spectral densities (PSDs) of both contributions with a source/filter model for the main instrument while retaining a model emphasizing temporal repetitions of the musical background. We show that improved source separation performances can be obtained by a two-step estimation strategy where the model parameters are re-estimated in a second stage by adequately exploiting the main melody line estimated in a first stage. The experiments conducted on several monaural signal databases show that our system achieves state-of-the-art performances compared to other unsupervised source separation algorithms.