From missing data to maybe useful data: soft data modelling for noise robust ASR

Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel distortion. We present here a new approach to data modelling which has the potential to combine complementary existing state-of-the-art techniques for speech enhancement and noise adaptation into a single process. In the "missing feature theory" (MFT) based approach to noise robust ASR, misinformative spectral data is detected and then ignored. Recent work has shown that MFT ASR greatly improves when the usual hard decision to exclude data features is softened by a continuous weighting between the likelihood contributions normally used with MFT for "clean" and "missing" data. The new model presented here can be seen as a generalisation of this "soft missing data" approach, in which the mixture pdf which is implicitly used to model clean or missing observation data is recognised as the data posterior pdf, and modelled accordingly. Initial "soft data" experiments compare the performance of different soft missing data models against baseline Gaussian mixture HMM performance. The test used is the Aurora 2.0 task for speaker independent continuous digits recognition.

Related material