Data utility modelling for mismatch reduction

Morris, Andrew

Morris, Andrew

2001

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic nature of missing data detection. Recent work has shown greatly improved performance by the "soft missing data" (SMD) approach, in which the "missing" status of each data value is represented by a continuous probability rather than a 0/1 value. This probability is then used to weight between the different likelihood contributions which the MD model normally assigns to each spectral observation according to its "missing" status. This article presents an analysis which shows that the SMD approach effectively implements a Maximum A-Posteriori (MAP) decoding strategy with missing or uncertain data, subject to the interpretation that the missing/not-missing probabilities are weights for a mixture pdf which models the pdf for each hidden clean data input, after conditioning by the noisy data input, a local noise estimate, and any information which may be available. An important feature of this "soft data" model is that control over the "evidence pdf" can provide a principled framework not only for ignoring unreliable data, but also for focusing attention on more discriminative features, and for data enhancement.

Details

Title Data utility modelling for mismatch reduction

Author(s) Morris, Andrew

Published in Proc. CRAC (workshop on Consistent & Reliable Acoustic Cues for sound analysis)

Conference CRAC (workshop on Consistent & Reliable Acoustic Cues for sound analysis)

Date 2001

Publisher Aalborg, Denmark

Keywords

soft missing data; speech; robust ASR; data utility

Additional link URL; Related documents

Laboratories LIDIAP

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LIDIAP - L'IDIAP Laboratory
Scientific production and competences > Euler Center for Signal Processing
Conference Papers
Work produced at EPFL
Published

Record creation date 2006-03-10

Actions

Preview

Select file: