Developing and Enhancing Posterior Based Speech Recognition Systems

Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni tion systems. In this paper, we present initial results towards boosting these approaches by improving posterior estimat es, using acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). In the present work, the enhanced posterior distribution is associated with the ``gamma'' distribution typically used in standard HMMs training, and estimated from local likelihoods (GMM) or local posteriors (ANN). This approach results in a family of new HMM based systems, where only posterior probabilities are used, while also providing a new, principled, approach towards a hierarchical use/integration of these posteriors, from the frame level up to the phone and word levels, and integrating the appropriate context and prior knowledge in each level. In the present work, we used the resulting posteriors as local scores in a Viter bi decoder. On the OGI Numbers'95 database, this resulted in improved recognition performance, compared to a state-of-the-art hybrid HMM/ANN system.

Published in:
Proceedings of Interspeech
Presented at:
Proceedings of Interspeech
Lisbon, Portugal
IDIAP-RR 05-23

 Record created 2006-03-10, last modified 2018-03-17

Download fulltextPDF
External links:
Download fulltextURL
Download fulltextRelated documents
Rate this document:

Rate this document:
(Not yet reviewed)