Files

Abstract

In the "missing data" (MD) approach to noise robust automatic speech recognition (ASR), speech models are trained on clean data, and during recognition sections of spectral data dominated by noise are detected and treated as "missing". However, this all-or-nothing hard decision about which data is missing does not accurately reflect the probabilistic nature of missing data detection. Recent work has shown greatly improved performance by the "soft missing data" (SMD) approach, in which the "missing" status of each data value is represented by a continuous probability rather than a 0/1 value. This probability is then used to weight between the different likelihood contributions which the MD model normally assigns to each spectral observation according to its "missing" status. This article presents an analysis which shows that the SMD approach effectively implements a Maximum A-Posteriori (MAP) decoding strategy with missing or uncertain data, subject to the interpretation that the missing/not-missing probabilities are weights for a mixture pdf which models the pdf for each hidden clean data input, after conditioning by the noisy data input, a local noise estimate, and any information which may be available. An important feature of this "soft data" model is that control over the "evidence pdf" can provide a principled framework not only for ignoring unreliable data, but also for focusing attention on more discriminative features, and for data enhancement.

Details

Actions

Preview