Files

Résumé

In this paper, we present and investigate a new method for subband-based Automatic Speech Recognition (ASR) which approximates the ideal `full combination' approach which is itself often not practical to realize. The `full combination' approach consists of explicitly considering all possible combinations of subbands (\cite{Hermansky96:TAO}) avoiding the usually necessary independence assumption, which would limit the potential of subband-based ASR. We show how this ideal approach can be effectuated by a nonlinear combination function which constitutes the fullband posterior probabilities decomposed into a weighted sum of posterior probabilities from Artificial Neural Network (ANN) experts. This involves training of one expert for each possible subband combination. To limit such extensive training, we have found that it is possible to achieve comparable results by estimating the subband posterios for each combinationas a function of the posteriors from the individual subbands alone (\cite{Hagen98:SBS,Morris99:TFC}). The theoretical foundation of our solution to the ideal `full combination' approach with the nonlinear combination function and its approximation are presented. The weights,which represent the relative utility for recognition of each subband combination, are very important for this technique and possible schemes for their estimation will be proposed. They have been tested and compared in the framework of HMM/ANN-Hybrid systems on clean and noise-added data.

Détails

PDF