Low cost duration modelling for noise robust speech recognition

Morris, Andrew; Payne, Simon; Bourlard, Hervé

doi:10.21437/ICSLP.2002-28

Morris, Andrew; Payne, Simon; Bourlard, Hervé

2002

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately reflect true duration distributions. The other is that they impose no hard limit on maximum duration with the result that state transition probabilities often have little influence when combined with acoustic probabilities, which are of a different order of magnitude. Explicit duration models were developed in the past to address the first problem. These were not widely taken up because their performance advantage in clean speech recognition was often not sufficiently great to offset the extra complexity which they introduced. However, duration models have much greater potential when applied to noisy speech recognition. In this paper we present a simple and generic form of explicit duration model and show that this leads to strong performance improvements when applied to connected digit recognition in noise.

Details

Title Low cost duration modelling for noise robust speech recognition

Author(s) Morris, Andrew ; Payne, Simon ; Bourlard, Hervé

Published in 7th International Conference on Spoken Language Processing (ICSLP 2002)

Pages 1025-1028

Conference ICSLP, Denver, Colorado, USA

Date 2002

Keywords

duration models; speech; HMMs; noise robust ASR

DOI https://doi.org/10.21437/ICSLP.2002-28

Additional link URL; Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2006-03-10

Files

Abstract

Details

PDF