Sparse Gammatone Signal Model Predicts Perceived Noise Intrusiveness

Is it possible to predict the intrusiveness of background noise in speech signals as perceived by humans? Such a question is important to the automatic evaluation of speech enhancement systems, including those designed for new wideband speech telephony, and the goal of a future ITU quality assessment standard. In this paper, we show that this is possible by modeling the encoding of the noise signal at the auditory nerve. Indeed, recent research suggests that sparse signal representations may be indicative of the encoding process in the auditory system, making them interesting for modeling human sound perception. Here, we further explore this hypothesis, and decompose background noise in the speech signal into a sparse combination of gammatone functions, resulting in a sparse, physiologically grounded representation of the noise. We then show that the number of gammatones required to encode the noise is directly correlated with the perception of noise intrusiveness. Furthermore, we show that an established measure of noise intrusiveness based on this new representation outperforms the same measure based on the traditional loudness model.

Related material