Minimum Mutual Information Beamforming for Simultaneous Active Speakers

Kumatani, Kenichi; Mayer, Uwe; Gehrig, Tobias; Stoimenov, Emilian; McDonough, John; Wölfel, Matthias

Kumatani, Kenichi; Mayer, Uwe; Gehrig, Tobias; Stoimenov, Emilian; McDonough, John; Wölfel, Matthias

2007

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Fichiers

Résumé

In this work, we consider an acoustic beamforming application where two speakers are simultaneously active. We construct one subband-domain beamformer in \emph{generalized sidelobe canceller} (GSC) configuration for each source. In contrast to normal practice, we then jointly optimize the \emph{active weight vectors} of both GSCs to obtain two output signals with \emph{minimum mutual information} (MMI). Assuming that the subband snapshots are Gaussian-distributed, this MMI criterion reduces to the requirement that the \emph{cross-correlation coefficient} of the subband outputs of the two GSCs vanishes. We also compare separation performance under the Gaussian assumption with that obtained from several super-Gaussian probability density functions (pdfs), namely, the Laplace, $K_0$, and $\Gamma$ pdfs. Our proposed technique provides effective nulling of the undesired source, but without the signal cancellation problems seen in conventional beamforming. Moreover, our technique does not suffer from the source permutation and scaling ambiguities encountered in conventional blind source separation algorithms. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the \emph{PASCAL Speech Separation Challenge} (SSC). On the SSC development data, the simple delay-and-sum beamformer achieves a word error rate (WER) of 70.4\%. The MMI beamformer under a Gaussian assumption achieves a 55.2\% WER, which is further reduced to 52.0\% with a $K_0$ pdf, whereas the WER for data recorded with a close-talking microphone is 21.6\%.

Détails

Titre Minimum Mutual Information Beamforming for Simultaneous Active Speakers

Auteur(s) Kumatani, Kenichi ; Mayer, Uwe ; Gehrig, Tobias ; Stoimenov, Emilian ; McDonough, John ; Wölfel, Matthias

Date 2007

Editeur IDIAP

Lien supplémentaire URL

Laboratoires LIDIAP

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LIDIAP - Laboratoire de l'IDIAP
Production scientifique et compétences > Euler Center for Signal Processing
Travail produit à l'EPFL
Rapports techniques
Publié

Date de création de la notice 2010-02-11

Actions

Aperçu

Sélectionner le fichier :