Infoscience

Report

Noise PDF transformation in secondary feature processing

Motivated by the human ability to maintain a high level of speech recognition when large parts of the spectrogram are masked (i.e. dominated) by noise, the original "missing data" (MD) approach to noise robust speech recognition was based on the paradigm whereby models are trained on clean speech and during recognition parts of the spectrogram identified as being dominated by noise are ignored by marginalisation over the clean data pdf. However, the implied rule that each spectral data value should be treated as either as 100% clean or completely missing is inaccurate. The performance of MD based recognition has been steadily improving over the last few years with each increase in the accuracy of the modelling of clean-data uncertainty. Another assumption of the MD approach, which is more reasonable, is that it is often relatively easy to obtain an accurate estimate of the local noise spectrum. In this report we present an analysis of the way in which uncertainty in the noise spectrum is transformed into uncertainty in the clean speech spectrum. The take up of this approach will depend on the existence of closed form and computationally feasible solutions to the equations here presented. This is a preliminary study and no empirical tests are included. It is intended as a theoretical foundation from which practical solutions may be developed in future.

Related material