Infoscience

Report

A Bayesian Switching Linear Dynamical System for Scale-Invariant robust speech extraction

Most state-of-the-art automatic speech recognition (ASR) systems deal with noise in the environment by extracting noise robust features which are subsequently modelled by a Hidden Markov Model (HMM). A limitation of this feature-based approach is that the influence of noise on the features is difficult to model explicitly and the HMM is typically over sensitive, dealing poorly with unexpected and severe noise environments. An alternative is to model the raw signal directly which has the potential advantage of allowing noise to be explicitly modelled. A popular way to model raw speech signals is to use an Autoregressive (AR) process. AR models are however very sensitive to variations in the amplitude of the signal. Our proposed Bayesian Autoregressive Switching Linear Dynamical System (BAR-SLDS) treats the observed noisy signal as a scaled, clean hidden signal plus noise. The variance of the noise and signal scaling factor are automatically adapted, enabling the robust identification of scale-invariant clean signals in the presence of noise.

Related material