Multiple Timescale Feature Combination towards Robust Speech Recognition
While a lot of progress has been made during the last years in the field of Automatic Speech recognition (ASR), one of the main remaining problems is that of robustness. Typically, state-of-the-art ASR systems work very efficiently in well-defined environments, e.g. for clean speech or known noise conditions. However, their performance degrades drastically under different conditions. Many approaches have been developed to circumvent this problem, ranging from noise cancellation to system adaptation techniques. This paper investigates the influence of using additional information from relatively long timescales to noise robustness. The multiple timescale feature combination approach is introduced. Experiments show that, while maintaining recognition performance for clean speech, robustness could be improved in noisy conditions.
rr00-29.pdf
openaccess
40.35 KB
Adobe PDF
b2e0b1a0d69646d5c376f6f1bdea18f0