An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

Vijayasenan, Deepu; Valente, Fabio; Bourlard, Hervé

Vijayasenan, Deepu; Valente, Fabio; Bourlard, Hervé

2010

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

This work describes a novel system for speaker diarization of meetings recordings based on the combination of acoustic features (MFCC) and Time Delay of Arrivals (TDOA). The first part of the paper analyzes differences between MFCC and TDOA features which possess completely different statistical properties. When Gaussian Mixture Models are used, experiments reveal that the diarization system is sensitive to the different recording scenarios (i.e. meeting rooms with varying number of microphones). In the second part, a new multistream diarization system is proposed extending previous work on Information Theoretic diarization. Both speaker clustering and speaker realignment steps are discussed; in contrary to current systems, the proposed method avoids to perform the feature combination averaging log-likelihood scores. Experiments on meetings data reveal that the proposed approach outperforms the GMM based system when the recording is done with varying number of microphones.

Details

Title An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

Author(s) Vijayasenan, Deepu ; Valente, Fabio ; Bourlard, Hervé

Date 2010

Publisher Idiap

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Technical Reports
Published

Record creation date 2010-08-26