Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables

Stephenson, Todd Andrew; Escofet, Jaume; Magimai.-Doss, Mathew; Bourlard, Hervé

Stephenson, Todd Andrew; Escofet, Jaume; Magimai.-Doss, Mathew; Bourlard, Hervé

2002

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition performance due to the noise inherent in estimating or modeling them. In this paper, we show experimentally how this can be corrected by either conditioning the emission distributions upon these features or by marginalizing out these features in recognition. Since this is not obvious to do with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayesian networks (DBNs), resulting in more flexibility in defining the topology of the emission distributions and in specifying whether variables should be marginalized out.

Details

Title Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables

Author(s) Stephenson, Todd Andrew ; Escofet, Jaume ; Magimai.-Doss, Mathew ; Bourlard, Hervé

Date 2002

Publisher IDIAP

Keywords

stephenson; speech

Note In ``2002 IEEE International Workshop on Neural Networks for Signal Processing (NNSP~2002)'', 2002

Additional link URL

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Work produced at EPFL
Technical Reports
Published

Record creation date 2006-03-10

Actions

Preview

Select file: