Subband-Based Speech Recognition in Noisy Conditions: The Full Combination Approach

In this report, we investigate and compare different subband-based Automatic Speech Recognition (ASR) approaches, including an original approach, referred to as the ``full combination approach'', based on an estimate of the (noise-) weighted sum of posterior probabilities for all possible subband combinations. We show that the proposed estimate is a good approximation of the ideal, but often unpractical, solution consisting in explicitly considering all possible subband subsets. This approximation results in a nonlinear, still simple and easy to implement, combination function. As opposed to other subband-based approaches, we believe that the proposed solution is more optimal (mathematically correct) and allows us to relax some of the (subband) independence assumptions. Similarly to this full posterior combination approach, which combines the subbands after independent processing, a full feature combination approach is investigated, in which all the possible subband features are orthogonalized and combined into a single feature vector (before probability estimation). The different approaches have been tested and compared on the Numbers'95 database (free format numbers) with different levels of (Noisex'92) car noise. This was done on the basis of two different acoustic features, namely PLP and J-RASTA-PLP features, and different weighting schemes. Those experiments show that the full combination approximation yields very good estimates of the actual full combination posteriors and that both approaches yield very good recognition performance.

Related material