Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. On quantifying the quality of acoustic models in hybrid DNN-HMM ASR
 
research article

On quantifying the quality of acoustic models in hybrid DNN-HMM ASR

Dighe, Pranay
•
Asaei, Afsaneh  
•
Bourlard, Herve  
May 1, 2020
Speech Communication

We propose an information theoretic framework for quantitative assessment of acoustic models used in hidden Markov model (HMM) based automatic speech recognition (ASR). The HMM backend expects that (i) the acoustic model yields accurate state conditional emission probabilities for the observations at each time step, and (ii) the conditional probability distribution of the data given the underlying hidden state is independent of any other state in the sequence. The latter property is also known as the Markovian conditional independence assumption of HMM based modeling. In this work, we cast HMM based ASR as a communication channel in which the acoustic model computes the state emission probabilities as the input of the channel and the channel outputs the most probable hidden state sequence. The quality of the acoustic model is thus quantified in terms of the amount of information transmitted through this channel as well as how robust this channel is against the mismatch between the data and HMM's conditional independence assumption. To formulate the required information theoretic terms, we utilize the gamma posterior (or state occupancy) probabilities of HMM hidden states to derive a simple and straightforward analysis framework which assesses the benefits and shortcomings of various acoustic models in HMM based ASR. Our approach enables us to analyse acoustic modeling with Gaussian mixture models (GMM) as well as deep neural networks (DNN) (with different number of hidden layers) without actually evaluating their ASR performance explicitly. As use cases, we apply our analysis on sequence discriminatively trained DNN acoustic models as well as state-of-the-art recurrent and time-delay neural networks to compare their efficacy as acoustic models in HMM based ASR. In addition, we also use our analysis to study the contribution of sparse and low-dimensional models in enhancing acoustic modeling for better compliance with the HMM requirements.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.specom.2020.03.001
Web of Science ID

WOS:000531017100003

Author(s)
Dighe, Pranay
Asaei, Afsaneh  
Bourlard, Herve  
Date Issued

2020-05-01

Publisher

ELSEVIER

Published in
Speech Communication
Volume

119

Start page

24

End page

35

Subjects

Acoustics

•

Computer Science, Interdisciplinary Applications

•

Computer Science

•

automatic speech recognition

•

acoustic modeling

•

information theory

•

deep neural networks

•

conditional mutual information

•

low-rank and sparsity

•

speech recognition

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LIDIAP  
LIONS  
Available on Infoscience
May 21, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/168845
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés