Continuous Microphone Array Speech Recognition on Wall Street Journal Corpus

In this paper, we present a robust speech acquisition system to acquire continuous speech using a microphone array. A microphone array based speech recognition system is also presented to study the environmental interference due to reverberation, background noises and mismatch between the training and testing conditions. This is important in the context of smart meeting rooms of Augmented MultiParty Interaction (AMI) project which aims at significant development of conversational speech recognition. In this regard, an audio-visual database containing the Wall Street journal phrases was recorded in a real meeting room for the stationary speaker, moving speaker and overlapping speech scenarios. We carried out speech enhancement and continuous speech recognition experiments on stationary speaker data. Using a microphone array with beamformer followed by a postfilter enhances speech quality slightly inferior to that of close-talk headset,and better than lapel. We achieved a significant reduction in word error rates using models adapted based on maximum linear likelihood regression (MLLR) and maximum-a-posteriori (MAP) approaches. Though the error rates of the microphone array data are larger than those of headset data, they are significantly smaller compared to the error rates of lapel data.

Related material