Some applications of a priori knowledge in multi-stream HMM and HMM/ANN based ASR

Morris, Andrew

Morris, Andrew

2000

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Multi-band ASR was largely inspired by the extremely high level of redundancy in the spectral signal representation which can be inferred from Fletcher's product-of-errors rule for human speech perception. Indeed, the main aim of the multi-band approach is to exploit this redundancy in order to overcome the problem of data mismatch (while making no assumptions about noise type) by focusing recognition on sub-bands estimated to contain reliable, or "clean speech like", data. However, multi-band processing also presents the opportunity to introduce a number of other ideas from phonetics, non-linear phonology and auditory processing into the recognition process. In particular: we can weight sub-bands, or sub-band combinations, according to the most likely frequency range of characteristic features for the phoneme whose presence we are testing for; we can allow some degree of asynchrony between sub-bands, and we can preprocess each sub-band according the kind of acoustic features which we expect to find there. Besides combining sub-band experts, we can also combine multiple full-band experts, where each expert is perhaps suited to extracting complementary sources of speech information, or is robust to different kinds of noise. In this article we present an outline of some of the recent work at IDIAP, and cooperating institutions, in bringing together ideas from different areas of speech science within the framework of multi-stream HMM and HMM/ANN based ASR.

Details

Title Some applications of a priori knowledge in multi-stream HMM and HMM/ANN based ASR

Author(s) Morris, Andrew

Published in Phonus No.5,Dec.2000, ISSN 0949-1791, Proc. Workshop on Phonetics and Phonology in ASR

Conference Phonus No.5,Dec.2000, ISSN 0949-1791, Proc. Workshop on Phonetics and Phonology in ASR

Date 2000

Publisher Saarbruecken, Germany

Keywords

speech; non-linear phonology; multi-band and multi-stream processing; noise robust ASR; combination of experts

Additional link URL

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2006-03-10

Files

Abstract

Details

PDF