An Adaptive Initialization Method for Speaker Diarization based on Prosodic Features

Imseng, David; Friedland, Gerald

doi:10.1109/ICASSP.2010.5495102

Imseng, David; Friedland, Gerald

2010

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The following article presents a novel, adaptive initialization scheme that can be applied to most state-ofthe-art Speaker Diarization algorithms, i.e. algorithms that use agglomerative hierarchical clustering with Bayesian Information Criterion (BIC) and Gaussian Mixture Models (GMMs) of frame-based cepstral features (MFCCs). The initialization method is a combination of the recently proposed “adaptive seconds per Gaussian” (ASPG) method and a new pre-clustering and number of initial clusters estimation method based on prosodic features. The presented initialization method has two important advantages. First, the method requires no manual tuning and is robust against file length and speaker count variations. Second, the method outperforms our previously used initialization methods on all benchmark files that were presented in the 2006, 2007, and 2009 NIST Rich Transcription (RT) evaluations and results in a Diarization Error Rate (DER) improvement of up to 67% (relative).

Details

Title An Adaptive Initialization Method for Speaker Diarization based on Prosodic Features

Author(s) Imseng, David ; Friedland, Gerald

Published in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing

Pages 4946-4949

Conference IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA

Date 2010

Keywords

Gaussian Mixture Models; Speaker Diarization; Prosodic features

DOI https://doi.org/10.1109/ICASSP.2010.5495102

Additional link URL; Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2010-02-11

Files

Abstract

Details

PDF