Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition

Do, Cong-Thanh; Taghizadeh, Mohammad J.; Garner, Philip N.

doi:10.1109/SLT.2012.6424211

Do, Cong-Thanh; Taghizadeh, Mohammad J.; Garner, Philip N.

2012

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This paper investigates the combination of cepstral normalization and cochlear implant-like speech processing for microphone array- based speech recognition. Testing speech signals are recorded by a circular microphone array and are subsequently processed with superdirective beamforming and McCowan post-filtering. Training speech signals, from the multichannel overlapping Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which is inspired from the speech processing strategy in cochlear implants, is applied on the training and testing speech signals. Cepstral normalization, including cepstral mean and variance normalization (CMN and CVN), are applied on the training and testing cepstra. Experiments show that implementing either cepstral normalization or cochlear implant-like speech pro- cessing helps in reducing the WERs of microphone array-based speech recognition. Combining cepstral normalization and cochlear implant-like speech processing reduces further the WERs, when there is overlapping speech. Train/test mismatches are measured using the Kullback-Leibler divergence (KLD), between the global probability density functions (PDFs) of training and testing cepstral vectors. This measure reveals a train/test mismatch reduction when either cepstral normalization or cochlear implant-like speech pro- cessing is used. It reveals also that combining these two processing reduces further the train/test mismatches as well as the WERs.