Clustering And Segmenting Speakers And Their Locations In Meetings

Ajmera, Jitendra; Lathoud, Guillaume; McCowan, Iain A.

Ajmera, Jitendra; Lathoud, Guillaume; McCowan, Iain A.

2003

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper presents a new approach toward automatic annotation of meetings in terms of speaker identities and their locations. This is achieved by segmenting the audio recordings using two independent sources of information : magnitude spectrum analysis and sound source localization. We combine the two in an appropriate HMM framework. There are three main advantages of this approach. First, it is completely unsupervised, i.e. speaker identities and number of speakers and locations are automatically inferred. Second, it is threshold-free, i.e. the decisions are made without the need of a threshold value which generally requires an additional development dataset. The third advantage is that the joint segmentation improves over the speaker segmentation derived using only acoustic features. Experiments on a series of meetings recorded in the IDIAP Smart Meeting Room demonstrate the effectiveness of this approach.

Details

Title Clustering And Segmenting Speakers And Their Locations In Meetings

Author(s) Ajmera, Jitendra ; Lathoud, Guillaume ; McCowan, Iain A.

Date 2003

Publisher Martigny, Switzerland, IDIAP

Keywords

speech; ajmera; lathoud; mccowan

Additional link URL

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-03-10

Actions

Preview

Select file: