Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model
 
report

Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model

Rasipuram, Ramya  
•
Magimai.-Doss, Mathew  
2014

One of the key challenge involved in building a statistical automatic speech recognition (ASR) system is modeling the relationship between lexical units (that are based on subword units in the pronunciation lexicon) and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources (speech signals with word level transcriptions) and lexical resources (which transcribes each word in terms of subword units). Standard ASR systems typically use phonemes or phones as subword units. Not all languages have well developed acoustic resources and phonetic lexical resources. In this paper, we show that modeling of the relationship between lexical units and acoustic features can be factored into two parts through a latent variable, referred to as acoustic units, namely: (a) acoustic model that models the relationship between acoustic features and acoustic units and (b) lexical model that models the relationship between lexical units and acoustic units. Through this understanding, we elucidate that in standard hidden Markov model (HMM) based ASR system, the lexical model is deterministic (i.e., there exists an one-to-one relationship between lexical units and acoustic units), and it is the deterministic lexical model that imposes the need for well developed acoustic and lexical resources in the target language or domain when building ASR system. We then propose an approach that addresses both acoustic resource and lexical resource constraints. More specifically, in the proposed approach the acoustic model models the relationship between acoustic features and multilingual phones (acoustic units) on target language-independent data, and the lexical model models a probabilistic relationship between lexical units based on graphemes and multilingual phones on small amount of target language data. We show the potential and the efficacy of the proposed approach through experiments and comparisons with other approaches on three different ASR tasks, namely, non-native accented speech recognition, rapid development of ASR system for a new language and development of ASR system for a minority language.

  • Details
  • Metrics
Type
report
Author(s)
Rasipuram, Ramya  
Magimai.-Doss, Mathew  
Date Issued

2014

Publisher

Idiap

Subjects

Automatic Speech Recognition

•

grapheme

•

Kullback-Leibler divergence based hidden Markov model

•

Lexical modeling

•

Lexicon

•

phoneme

URL
http://publications.idiap.ch/downloads/reports/2014/Rasipuram_Idiap-RR-02-2014.pdf
Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
April 28, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/102962
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés