Developing and Enhancing Posterior Based Speech Recognition Systems

Local state or phone posterior probabilities are often investigated as local scores (e.g., hybrid HMM/ANN systems) or as transformed acoustic features (e.g., ``Tandem'') to improve speech recogni tion systems. In this paper, we present initial results towards boosting these approaches by improving posterior estimat es, using acoustic context (e.g., as available in the whole utterance), as well as possible prior information (such as topological constraints). In the present work, the enhanced posterior distribution is associated with the ``gamma'' distribution typically used in standard HMMs training, and estimated from local likelihoods (GMM) or local posteriors (ANN). This approach results in a family of new HMM based systems, where only posterior probabilities are used, while also providing a new, principled, approach towards a hierarchical use/integration of these posteriors, from the frame level up to the phone and word levels, and integrating the appropriate context and prior knowledge in each level. In the present work, we used the resulting posteriors as local scores in a Viter bi decoder. On the OGI Numbers'95 database, this resulted in improved recognition performance, compared to a state-of-the-art hybrid HMM/ANN system.


Note: The status of this file is: Anyone

 Record created 2006-03-10, last modified 2020-10-24

Download fulltextPDF
External link:
Download fulltextURL
Rate this document:

Rate this document:
(Not yet reviewed)