Multi-Stream Speech Recognition

In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply (as they usually are in state-of-the-art systems). However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams. This paper introduces the basic framework of a statistical structure that can accommodate multiple (asynchronous) observation streams (possibly exhibiting different frame rates). This approach will then be applied to the particular case of multi-band speech recognition and will be shown to yield significantly better noise robustness.

Related material