Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation

Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic model- or feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic model- and feature-level adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in our case) and perform feature-level adaptation through the estimation of phone class posterior features with a Multilayer Perceptron that was trained on data from a similar language with large amounts of available speech data (Dutch in our case). The same Dutch data can also be exploited on an acoustic model-level by training globally-shared SGMM parameters in a crosslingual way. The two adaptation techniques are indeed complementary and result in a crosslingual Tandem-SGMM system that yields relative improvement of about 22% compared to a standard speech recognizer on an Afrikaans phoneme recognition task. Interestingly, eventual score-level combination of the individual SGMM systems yields additional 3% relative improvement.

Presented at:
ISCA - International Speech Communication Association - Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France

 Record created 2013-12-19, last modified 2018-01-28

External links:
Download fulltextRelated documents
Download fulltextn/a
Rate this document:

Rate this document:
(Not yet reviewed)