HMM/ANN Based Spectral Peak Location Estimation for Noise Robust Speech Recognition

In this paper, we present a HMM/ANN based algorithm to estimate the spectral peak locations. This algorithm makes use of distinct time-frequency (TF) patterns in the spectrogram for estimating the peak locations. Such an use of TF patterns is expected to impose temporal constraints during the peak estimation task, thereby yielding a smoother estimate of the peaks over time. Additionally, the algorithm use an ergodic topology for the HMM/ANN, thus allowing an estimation of a varying number of peak locations over time. The usefulness of the proposed algorithm is evaluated in the framework of a recently introduced noise robust feature called spectro-temporal activity pattern (STAP) feature. Interestingly, recently introduced, phase autocorrelation (PAC) spectrum, with enhanced spectral peaks and smoothed spectral valleys, turns out to be more appropriate for this algorithm than the regular spectrum.

Related material