Automatic Speech Recognition using Pitch Information in Dynamic Bayesian Networks

The challenge of automatic speech recognition (ASR) increases when speaker variability is encountered. Being able to automatically use different acoustic models according to speaker type might help to increase the robustness of ASR. We present a system that attempts to do so by augmenting the standard acoustic observations with pitch information. This allows the system to use acoustic models more appropriate to speech with the given pitch. Furthermore, pitch information is more easily detected in noisy conditions; thus, it may be of use in robust speech recognition. Using dynamic Bayesian networks (DBNs) allows further refinement of the system by eliminating unnecessary statistical dependencies and thus reducing the number of parameters. We show that when a system is trained on observed pitch data and performs recognition with missing pitch data, it can perform significantly better than a system that uses acoustics information only.

Related material