Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Dighe, Pranay; Asaei, Afsaneh; Bourlard, Hervé

doi:10.1109/ICASSP.2017.7953161

conference paper

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Dighe, Pranay

•

Asaei, Afsaneh

•

Bourlard, Hervé

2017

2017 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp)

Proceedings of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and lowdimensional. We exploit principal component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/132081

Type

conference paper

DOI

10.1109/ICASSP.2017.7953161

Web of Science ID

WOS:000414286205085

Authors

Dighe, Pranay

•

Asaei, Afsaneh

•

Bourlard, Hervé

Publication date

2017

Publisher

Ieee

Published in

2017 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp)

ISBN of the book

978-1-5090-4117-6

Publisher place

New York

Total of pages

5

Start page

5265

End page

5269

Subjects

Soft targets

Principle component a...

Sparse coding

Automatic speech reco...

Untranscribed data

Peer reviewed

REVIEWED

EPFL units

LIDIAP

Event name

Proceedings of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)

Available on Infoscience

December 19, 2016