000189423 001__ 189423
000189423 005__ 20180913062056.0
000189423 0247_ $$2doi$$a10.1109/Tasl.2013.2260151
000189423 022__ $$a1558-7916
000189423 02470 $$2ISI$$a000319020800004
000189423 037__ $$aARTICLE
000189423 245__ $$aRobust Log-Energy Estimation and its Dynamic Change Enhancement for In-car Speech Recognition
000189423 260__ $$aPiscataway$$bIeee-Inst Electrical Electronics Engineers Inc$$c2013
000189423 269__ $$a2013
000189423 300__ $$a10
000189423 336__ $$aJournal Articles
000189423 520__ $$aThe log-energy parameter, typically derived from a full-band spectrum, is a critical feature commonly used in automatic speech recognition (ASR) systems. However, log-energy is difficult to estimate reliably in the presence of background noise. In this paper, we theoretically show that background noise affects the trajectories of not only the "conventional" log-energy, but also its delta parameters. This results in a poor estimation of the actual log-energy and its delta parameters, which no longer describe the speech signal. We thus propose a new method to estimate log-energy from a sub-band spectrum, followed by dynamic change enhancement and mean smoothing. We demonstrate the effectiveness of the proposed log-energy estimation and its post-processing steps through speech recognition experiments conducted on the in-car CENSREC-2 database. The proposed log-energy (together with its corresponding delta parameters) yields an average improvement of 32.8% compared with the baseline front-ends. Moreover, it is also shown that further improvement can be achieved by incorporating the new Mel-Frequency Cepstral Coefficients (MFCCs) obtained by non-linear spectral contrast stretching.
000189423 6531_ $$aDynamic change enhancement
000189423 6531_ $$ain-car speech recognition
000189423 6531_ $$alog-energy
000189423 6531_ $$amel-filterbank (MFB)
000189423 6531_ $$amel-frequency cepstral coefficients (MFCCs)
000189423 700__ $$0242359$$aLi, Weifeng$$g188567$$uTsinghua Univ, Shenzhen Key Lab Informat Sci & Technol, Dept Elect Engn, Grad Sch Shenzhen, Shenzhen 518055, Peoples R China
000189423 700__ $$aWang, Longbiao$$uNagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
000189423 700__ $$aZhou, Yicong$$uUniv Macau, Dept Comp & Informat Sci, Macao, Peoples R China
000189423 700__ $$aBourlard, Herv$$uIdiap Res Inst, CH-1015 Lausanne, Switzerland
000189423 700__ $$aLiao, Qingmin$$uTsinghua Univ, Shenzhen Key Lab Informat Sci & Technol, Dept Elect Engn, Grad Sch Shenzhen, Shenzhen 518055, Peoples R China
000189423 773__ $$j21$$k8$$q1689-1698$$tIeee Transactions On Audio Speech And Language Processing
000189423 909C0 $$0252189$$pLIDIAP$$xU10381
000189423 909CO $$ooai:infoscience.tind.io:189423$$pSTI$$particle
000189423 937__ $$aEPFL-ARTICLE-189423
000189423 973__ $$aEPFL$$rREVIEWED$$sPUBLISHED
000189423 980__ $$aARTICLE