Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Robust speech recognition based on multi-stream processing
 
report

Robust speech recognition based on multi-stream processing

Hagen, Astrid
2001

Despite sophisticated present day automatic speech recognition (ASR) techniques, a single recognizer is usually incapable of accounting for the varying conditions in a typical natural environment. Higher robustness to a range of noise cases can potentially be achieved by combining the results of several recognizers operating in parallel. One such approach is multi-band processing, mimicking parallel processing of frequency subbands in human speech recognition as had been claimed by Fletcher. However, recent findings in both human and automatic speech recognition have revealed insufficiencies, such as the assumption of independence between frequency subbands, of the original multi-band ASR approach which often leads to reduced performance in the case of clean speech and wide-band noise. To overcome this problem, we propose and investigate a new set of full combination'' rules which integrate acoustic models trained on all possible combinations of subbands, preserving correlation information and leading to higher performance in all noise conditions. In this development, particular attention was given to the theoretical basis for all of the rules developed in terms of statistical theory, so that the assumptions that were necessary in each model become clear. The new combination strategies are developed for both posterior- and likelihood-based systems. These new combination strategies are then also applied to the combination of diverse feature streams, for example derived from multi-time scale analysis, which results in better exploitation of the often used instantaneous and time difference features. While combination may give the same weight to each expert, robustness of a multiple stream system can be further enhanced when each stream expert is assigned a weight reflecting its reliability. The new combination techniques are tested with several fixed and adaptive weighting strategies, including relative frequency of correct classification, least mean squared error, local signal-to-noise ratio, and maximum-likelihood based weights. We will see how the new multi-band approaches, which are consistently trained in clean speech, outperform original multi-band ASR models in both clean and noisy speech. Multi-band processing improves over the baseline fullband recognizer only in the case of narrow-band noise. However, combining multiple data streams from different time scales, using the same full combination'' rules, has also shown to significantly improve over the baseline in wide-band factory noise.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

rr01-41.pdf

Access type

openaccess

Size

2.05 MB

Format

Adobe PDF

Checksum (MD5)

b5b204d9c1cb0e1704fdb5a4085dd349

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés