Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and lowdimensional. We exploit principal component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate.


Published in:
2017 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp), 5265-5269
Presented at:
Proceedings of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
Year:
2017
Publisher:
New York, Ieee
ISSN:
1520-6149
ISBN:
978-1-5090-4117-6
Keywords:
Laboratories:




 Record created 2016-12-19, last modified 2018-09-13


Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)