Automatic Speech Recognition using Dynamic Bayesian Networks with the Energy as an Auxiliary Variable

In current automatic speech recognition (ASR) systems, the energy is not used as part of the feature vector in spite of being a fundamental feature in the speech signal. The noise inherent in its estimation degrades the system performance. In this report we present an alternative approach for introducing the energy into the system so that it can help to enhance recognition. We present the experimental results of an ASR system based on dynamic Bayesian networks (DBNs) using the energy as an auxiliary variable. DBNs belong to the same family of statistical models as hidden Markov models (HMMs). However, DBNs are a more general framework and they allow more flexibility in defining new probabilistic relations between variables. We tried different network topologies and we noticed the benefit of conditioning the feature vector on the energy. Furthermore, hiding the value of the energy in recognition also improved the recognition performance.

Related material