Robust overlapping speech recognition based on neural networks

We address issues for improving hands-free speech recognition performance in the presence of multiple simultaneous speakers using multiple distant microphones. In this paper, a log spectral mapping is proposed to estimate the log mel-filterbank outputs of clean speech from multiple noisy speech using neural networks. Both the mapping of the far-field speech and combination of the enhanced speech and the estimated interfering speech are investigated. Our neural network based feature enhancement method incorporates the noise information and can be viewed as a non-linear log spectral subtraction. Experimental studies on MONC corpus showed that MLP-based mapping techniques yields a improvement in the recognition accuracy for the overlapping speech.

Related material