Enhancing posterior based speech recognition systems

Ketabdar, Hamed

doi:10.5075/epfl-thesis-4218

Ketabdar, Hamed

2008

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for enhancing the estimation of local posteriors, by integrating phonetic and lexical knowledge, as well as long contextual information. This framework allows for hierarchical estimation, integration and use of local posteriors from the phoneme up to the word level. We propose two approaches for enhancing the posteriors. In the first approach, phoneme posteriors estimated with an ANN (particularly multi-layer Perceptron – MLP) are used as emission probabilities in HMM forward-backward recursions. This yields new enhanced posterior estimates integrating HMM topological constraints (encoding specific phonetic and lexical knowledge), and long context. In the second approach, a temporal context of the regular MLP posteriors is post-processed by a secondary MLP, in order to learn inter and intra dependencies among the phoneme posteriors. The learned knowledge is integrated in the posterior estimation during the inference (forward pass) of the second MLP, resulting in enhanced posteriors. The use of resulting local enhanced posteriors is investigated in a wide range of posterior based speech recognition systems (e.g. Tandem and hybrid HMM/ANN), as a replacement or in combination with the regular MLP posteriors. The enhanced posteriors consistently outperform the regular posteriors in different applications over small and large vocabulary databases.

Details

Title Enhancing posterior based speech recognition systems

Author(s) Ketabdar, Hamed

Advisor(s)

Bourlard, Hervé

Pagination 178

Date 2008

Publisher Lausanne, EPFL

Keywords

Posterior Based ASR; Artificial Neural Networks; Local Posteriors; Context; Phonetic and Lexical Knowledge; Enhanced Posteriors; ASR basé sur les probabilités a posteriori; réseaux de neurones artificiels; probabilité locale a posteriori; information contextuelle; connaissance phonétique et lexicale; estimations améliorées des probabilités a posteriori

Language English

DOI https://doi.org/10.5075/epfl-thesis-4218

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis4218-2

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2008-09-05

Files

Abstract

Details

PDF