A Tractable Framework for Estimating and Combining Spectral Source Models for Audio Source Separation

Arberet, Simon; Ozerov, Alexey; Bimbot, Frédéric; Gribonval, Rémi

doi:10.1016/j.sigpro.2011.12.022

Arberet, Simon; Ozerov, Alexey; Bimbot, Frédéric; Gribonval, Rémi

2012

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scaled Mixture Models (Spectral-GSMMs) or Spectral Non-negative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problems is that using a classical Expectation-Maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining different probabilistic models that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can significantly improve the BSS performance compared to the state-of-the-art approaches. (c) 2012 Elsevier B.V. All rights reserved.

Details

Title A Tractable Framework for Estimating and Combining Spectral Source Models for Audio Source Separation

Author(s) Arberet, Simon ; Ozerov, Alexey ; Bimbot, Frédéric ; Gribonval, Rémi

Published in Signal Processing

Volume 92

Issue 8

Pages 1886-1901

Date 2012

Keywords

Blind source separation; multichannel audio; Gaussian mixture model; expectation-maximization algorithm; convolutive mixture; LTS2

Note Special issue on "Latent Variable Analysis and Signal Separation"

DOI https://doi.org/10.1016/j.sigpro.2011.12.022

Other identifier(s) View record in Web of Science

Laboratories LTS2

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS2 - Signal Processing Laboratory 2
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2011-05-18

Actions

Preview

Select file: