The ICSI RT-09 Speaker Diarization System

Friedland, Gerald; Janin, Adam; Imseng, David; Anguera Miro, Xavier; Gottlieb, Luke; Huijbregts, Marijn; Knox, Mary Tai; Vinyals, Oriol

doi:10.1109/TASL.2011.2158419

Friedland, Gerald; Janin, Adam; Imseng, David; Anguera Miro, Xavier; Gottlieb, Luke; Huijbregts, Marijn; Knox, Mary Tai; Vinyals, Oriol

2012

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques developed for the ICSI speaker diarization engine. Although there have been many related publications over the years, previous articles only presented changes and improvements rather than a description of the full system. Attempting to replicate the ICSI speaker diarization system as a complete entity would require an extensive literature review, and might ultimately fail due to component description version mismatches. This paper therefore presents the first full conceptual description of the ICSI speaker diarization system as presented to the National Institute of Standards Technology Rich Transcription 2009 (NIST RT-09) evaluation, which consists of online and offline subsystems, multi-stream and single-stream implementations, and audio and audio-visual approaches. Some of the components, such as the online system, have not been previously described. The paper also includes all necessary preprocessing steps, such as Wiener filtering, speech activity detection and beamforming.

Details

Title The ICSI RT-09 Speaker Diarization System

Author(s) Friedland, Gerald ; Janin, Adam ; Imseng, David ; Anguera Miro, Xavier ; Gottlieb, Luke ; Huijbregts, Marijn ; Knox, Mary Tai ; Vinyals, Oriol

Published in IEEE Transactions on Audio, Speech, and Language Processing

Volume 20

Issue 2

Pages 371-381

Date 2012

ISSN 1558-7916

Keywords

Gaussian mixture models (GMMs); machine learning; speaker diarization

DOI https://doi.org/10.1109/TASL.2011.2158419

Other identifier(s) View record in Web of Science

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2012-03-01

Abstract

Details

Actions