Pronunciation models and their evaluation using confidence measures

In this report, we present preliminary experiments towards automatic inference and evaluation of pronunciation models based on multiple utterances of each lexicon word and their given baseline pronunciation model (baseform phonetic transcription). In the present system, the pronunciation models are extracted by decoding each of the training utterances through a series of hidden Markov models (HMM), first initialized to only allow the generation of the baseline transcription but iteratively relaxed to converge to a truly ergodic HMM. Each of the generated pronunciation models are then evaluated based on their confidence measure and their Levenshtein distance with the baseform model. The goal of this study is twofold. First, we show that this approach is appropriate to generate robust pronunciation variants. Second, we intend to use this approach to optimize these pronunciation models, by modifying/extending the acoustic features, to increase their confidence scores. In other words, while classical pronunciation modeling approaches usually attempt to make the models more and more complex to capture the pronunciation variability, we intend to fix the pronunciation models and optimize the acoustic parameters to maximize their matching and discriminant properties.

Related material