Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR
 
Loading...
Thumbnail Image
report

On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR

Tyagi, Vivek
•
Bourlard, Hervé  
•
Wellekens, Christian
2005

It is often acknowledged that speech signals contain short-term and long-term temporal properties that are difficult to capture and model by using the usual fixed scale (typically 20ms) short time spectral analysis used in hidden Markov models (HMMs), based on piecewise stationarity and state conditional independence assumptions of acoustic vectors. For example, vowels are typically quasi-stationary over 40-80ms segments, while plosives typically require analysis below 20ms segments. Thus, fixed scale analysis is clearly sub-optimal for ``optimal'' time-frequency resolution and modeling of different stationary phones found in the speech signal. In the present paper, we investigate the potential advantages of using variable size analysis windows towards improving state-of-the-art speech recognition systems. Based on the usual assumption that the speech signal can be modeled by a varying autoregressive (AR) Gaussian process, we estimate the largest piecewise quasi-stationary speech segments, based on the likelihood that a segment was generated by the same AR process. This likelihood is estimated from the Linear Prediction (LP) residual error. Each of these quasi-stationary segments is then used as an analysis window from which spectral features are extracted. Such an approach thus results in variable scale time spectral analysis, adaptively estimating the largest possible analysis window size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. Speech recognition experiments on the OGI Numbers95 database show that the proposed multi-scale piecewise stationary spectral analysis based features indeed yield improved recognition accuracy in clean conditions, compared to features based on minimum cross entropy spectrum as well as those based on fixed scale spectral analysis.

  • Files
  • Details
  • Metrics
Type
report
Author(s)
Tyagi, Vivek
•
Bourlard, Hervé  
•
Wellekens, Christian
Date Issued

2005

Publisher

IDIAP

Subjects

Speech

URL

URL

http://publications.idiap.ch/downloads/reports/2005/RR-05-19.pdf
Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
March 10, 2006
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/228742
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés