Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling
 
research article

Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling

Dighe, Pranay
•
Asaei, Afsaneh  
•
Bourlard, Herve  
May 1, 2019
Speech Communication

Towards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to have a very low rank. In this paper, we exploit Principal Component Analysis and Compressive Sensing based dictionary learning for low-rank and sparse modeling of senone subspaces respectively. Our hypothesis is that the principal components of DNN posterior space (termed as eigen-posteriors in this work) and Compressive Sensing dictionaries can act as suitable models to extract the well-structured low dimensional latent information and discard the undesirable high-dimensional unstructured noise present in the posteriors. Enhanced DNN posteriors thus obtained are used as soft targets for training better acoustic models to improve ASR. In this context, our approach also enables improving distant speech recognition by mapping far-field acoustic features to low-dimensional senone subspaces learned from near-field features. Experiments are performed on AMI Meeting corpus in both close-talk (IHM) and far-field (SDM) microphone settings where acoustic models trained using enhanced DNN posteriors outperform the conventional hard target based hybrid DNN-HMM systems. An information theoretic analysis is also presented to show how low-rank and sparse enhancement modify the DNN posterior space to better match the assumptions of hidden Markov model (HMM) backend.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.specom.2019.03.004
Web of Science ID

WOS:000472684400004

Author(s)
Dighe, Pranay
Asaei, Afsaneh  
Bourlard, Herve  
Date Issued

2019-05-01

Publisher

ELSEVIER SCIENCE BV

Published in
Speech Communication
Volume

109

Start page

34

End page

45

Subjects

Acoustics

•

Computer Science, Interdisciplinary Applications

•

Computer Science

•

automatic speech recognition

•

acoustic modeling

•

deep neural networks

•

low-rank and sparsity

•

deep neural-network

•

query

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
LIONS  
Available on Infoscience
July 7, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/158910
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés