Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics
We propose an alternative means of training a multilayer perceptron for the task of speech activity detection based on a criterion to minimise the error in the estimation of mean and variance statistics for speech cepstrum based features using the Kullback-Leibler divergence. We present our baseline and proposed speech activity detection approaches for multi-channel meeting room recordings and demonstrate the effectiveness of the new criterion by comparing the two approaches when used to carry out cepstrum mean and variance normalisation of features used in our meeting ASR system.
dines-idiap-rr-07-13.pdf
openaccess
143.84 KB
Adobe PDF
13129de3ea32417ffb75785e28e5c16e