In-Context Phone Posteriors as Complementary Features for Tandem ASR

Ketabdar, Hamed; Bourlard, Hervé

conference paper not in proceedings

Ketabdar, Hamed

•

Bourlard, Hervé

2008

ICSLP'08

In this paper, we present a method for integrating possible prior knowledge (such as phonetic and lexical knowledge), as well as acoustic context (e.g., the whole utterance) in the phone posterior estimation, and we propose to use the obtained posteriors as complementary posterior features in Tandem ASR configuration. These posteriors are estimated based on HMM state posterior probability definition (typically used in standard HMMs training). In this way, by integrating the appropriate prior knowledge and context, we enhance the estimation of phone posteriors. These new posteriors are called ?in-context? or HMM posteriors. We combine these posteriors as complementary evidences with the posteriors estimated from a Multi Layer Percep- tron (MLP), and use the combined evidence as features for training and inference in Tandem configuration. This approach has improved the performance, as compared to using only MLP estimated posteriors as features in Tandem, on OGI Numbers , Conversational Telephone speech (CTS), and Wall Street Journal (WSJ) databases.

Name

haketa-rr-08-44.pdf

Access type

openaccess

Size

155.75 KB

Format

Adobe PDF

Checksum (MD5)

7c5bfd06fde71a6433f72bcf4a437b00