Robust HMM-Based Speech/Music Segmentation

Ajmera, Jitendra; McCowan, Iain A.; Bourlard, Hervé

Ajmera, Jitendra; McCowan, Iain A.; Bourlard, Hervé

2001

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this paper we present a new approach towards high performance speech/music segmentation on realistic tasks related to the automatic transcription of broadcast news. In the approach presented here, the local probability density function (PDF) estimators trained on clean microphone speech are used as a channel model at the output of which the entropy and ``dynamism'' will be measured and integrated over time through a 2-state (speech and and non-speech) hidden Markov model (HMM) with minimum duration constraints. The parameters of the HMM are trained using the EM algorithm in a completely unsupervised manner. Different experiments, including a variety of speech and music styles, as well as different segment durations of speech and music signals (real data distribution, mostly speech, or mostly music), will illustrate the robustness of the approach, which in each case achieves a frame-level accuracy greater than 94\%.

Details

Title Robust HMM-Based Speech/Music Segmentation

Author(s) Ajmera, Jitendra ; McCowan, Iain A. ; Bourlard, Hervé

Date 2001

Publisher Martigny, Switzerland, IDIAP

Keywords

speech; ajmera; mccowan; bourlard

Note ICASSP,Orlando, Florida, 2002

Additional link URL

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-03-10

Actions

Preview

Select file: