Parametric coding of spatial audio

A wide range of techniques for coding a single speech or audio signal channel have been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bitrate. Only a few techniques address joint-coding of the channels of stereo1 and multi-channel audio signals. Stereo and multi-channel audio signals evoke an auditory spatial image in a listener. Thus the receiver model may consider properties of spatial hearing of the auditory system for reducing the bitrate. This has been done in previous techniques by considering the importance of interaural level difference cues at high frequencies and by considering the binaural masking level difference for computing the masked threshold for multiple audio channels. The coding scheme proposed in this thesis aims at being more systematic and parameterized. A stereo or multi-channel audio signal is represented as a single downmixed audio channel plus side information. The side information contains the inter-channel cues inherent in the original audio signal that are relevant for the perception of the properties of the auditory spatial image. At the decoder the stereo or multi-channel audio signal is reconstructed such that its inter-channel cues approximate the corresponding cues of the original audio signal. This enables coding of stereo or multi-channel audio signals at a bitrate nearly as low as a mono audio coding bitrate because the side information contains about two orders of magnitude less information than the original audio channel waveforms. This corresponds to a significant bitrate reduction compared to conventional state-of-the-art coders. Several subjective tests were conducted, indicating that good audio quality can be achieved by the proposed scheme. A number of variations of the coding scheme are proposed. These include different combinations of conventional multi-channel audio coders and the proposed coding scheme, and a scheme which provides flexibility at the decoder to manipulate the auditory spatial image. A model for source localization in the presence of concurrent sound (other sources and reflections) is proposed. The results from a number of previous psychophysical studies are predicted successfully by the model. The model is also applied for comparing audio signals to corresponding signals coded with the proposed scheme.   1In this thesis, the term "stereo audio signal" always refers to two-channel stereo audio signals.

Vetterli, Martin
Lausanne, EPFL
Sélectionné pour le "Prix EPFL de doctorats 2004" avec mention spéciale - "EPFL doctorate award 2004" distinction nominee
Other identifiers:
urn: urn:nbn:ch:bel-epfl-thesis3062-8

 Record created 2005-03-16, last modified 2018-03-17

Texte intégral / Full text:
Download fulltextPDF
External link:
Download fulltextAward
Rate this document:

Rate this document:
(Not yet reviewed)