Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition

Stephenson, Todd Andrew; Magimai.-Doss, Mathew; Bourlard, Hervé

doi:10.21437/ICSLP.2002-357

Stephenson, Todd Andrew; Magimai.-Doss, Mathew; Bourlard, Hervé

2002

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In previous work, we presented a case study using an estimated pitch value as the conditioning variable in conditional Gaussians that showed the utility of hiding the pitch values in certain situations or in modeling it independently of the hidden state in others. Since only single conditional Gaussians were used in that work, we extend that work here to using conditional Gaussian mixtures in the emission distributions to make this work more comparable to state-of-the-art automatic speech recognition. We also introduce a rate-of-speech (ROS) variable within the conditional Gaussian mixtures. We find that, under the current methods, using observed pitch or ROS in the recognition phase does not provide improvement. However, systems trained on pitch or ROS may provide improvement in the recognition phase over the baseline when the pitch or ROS is marginalized out.

Details

Title Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition

Author(s) Stephenson, Todd Andrew ; Magimai.-Doss, Mathew ; Bourlard, Hervé

Published in Seventh International Conference on Spoken Language Processing (ICSLP 2002)

Volume 4

Pages 2665-2668

Conference Seventh International Conference on Spoken Language Processing (ICSLP~2002), Denver, CO, USA

Date 2002

Keywords

stephenson; speech

DOI https://doi.org/10.21437/ICSLP.2002-357

Additional link URL; Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2006-03-10

Actions

Preview

Select file: