Predicting the intrusiveness of noise through sparse coding with auditory kernels

This paper presents a novel approach to predicting the intrusiveness of background noises in speech signals as it is perceived by human listeners. This problem is of particular interest in telephony, where the recently widened range of transmitted audio frequencies has increased the importance of appropriate background noise reduction strategies. Current approaches predict the average noise intrusiveness score that would be obtained in a subjective listening test by combining different signal features related to physical properties (e.g., signal energy, spectral distribution) or psychoacoustic estimations (e.g., loudness) of noise. The combination and/or implementation of such features requires expert knowledge or the availability of training data. We present a novel approach that is based on a model of efficient sound coding, using a sparse spike coding representation of noise. We show that the sparsity of these representations implicitly models several factors in the perception of noise, and yields predictions of noise intrusiveness scores that compare to or outperform traditional features, without the use of training data. Our evaluation datasets and used performance metrics are based on standardized methods for the evaluation of quality prediction models.

Published in:
Speech Communication, 76, 186-200

 Record created 2015-11-19, last modified 2018-09-13

Rate this document:

Rate this document:
(Not yet reviewed)