Multi-stream processing for noise robust speech recognition

Misra, Hemant

doi:10.5075/epfl-thesis-3508

Misra, Hemant

2006

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis, the framework of multi-stream combination has been explored to improve the noise robustness of automatic speech recognition (ASR) systems. The central idea of multi-stream ASR is to combine information from several sources to improve the performance of a system. The two important issues of multi-stream systems are which information sources (feature representations) to combine and what importance (weights) be given to each information source. In the framework of hybrid hidden Markov model/artificial neural network (HMM/ANN) and Tandem systems, several weighting strategies are investigated in this thesis to merge the posterior outputs of multi-layered perceptrons (MLPs) trained on different feature representations. The best results were obtained by inverse entropy weighting in which the posterior estimates at the output of the MLPs were weighted by their respective inverse output entropies. In the second part of this thesis, two feature representations have been investigated, namely pitch frequency and spectral entropy features. The pitch frequency feature is used along with perceptual linear prediction (PLP) features in a multi-stream framework. The second feature proposed in this thesis is estimated by applying an entropy function to the normalized spectrum to produce a measure which has been termed spectral entropy. The idea of the spectral entropy feature is extended to multi-band spectral entropy features by dividing the normalized full-band spectrum into sub-bands and estimating the spectral entropy of each sub-band. The proposed multi-band spectral entropy features were observed to be robust in high noise conditions. Subsequently, the idea of embedded training is extended to multi-stream HMM/ANN systems. To evaluate the maximum performance that can be achieved by frame-level weighting, we investigated an "oracle test". We also studied the relationship of oracle selection to inverse entropy weighting and proposed an alternative interpretation of the oracle test to analyze the complementarity of streams in multi-stream systems. The techniques investigated in this work gave a significant improvement in performance for clean as well as noisy test conditions.

Details

Title Multi-stream processing for noise robust speech recognition

Author(s) Misra, Hemant

Advisor(s)

Bourlard, Hervé

Pagination 175

Date 2006

Publisher Lausanne, EPFL

Language English

DOI https://doi.org/10.5075/epfl-thesis-3508

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis3508-8

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2006-03-13

Actions

Preview

Select file: