Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum-likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.


Published in:
IEEE Transactions on Audio, Speech, and Language Processing, 18, 3, 564-575
Year:
2010
Publisher:
Institute of Electrical and Electronics Engineers
ISSN:
1558-7916
Laboratories:




 Record created 2011-02-10, last modified 2018-03-17

n/a:
Download fulltextPDF
External link:
Download fulltextURL
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)