Improving Posterior Based Confidence Measures in Hybrid HMM/ANN Speech Recognition Systems
In this paper we define and investigate a set of confidence measures based on hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) acoustic models. All these measures are using the neural network to estimate the local phone posterior probabilities, which are then combined and normalized in different ways. Experimental results will indeed show that the use of an appropriate duration normalization is very important to obtain good estimates of the phone and word confidences. The different measures are evaluated at the phone and word levels on both an isolated word task (PHONEBOOK) and a continuous speech recognition task (BREF). It will be shown that one of those confidence measures is well suited for utterance verification, and that (as one could expect) confidence measures at the word level perform better than those at the phone level. Finally, using the resulting approach on PHONEBOOK to rescore the N-best list is shown to yield a 34% decrease in word error rate.
Record created on 2006-03-10, modified on 2016-08-08