Binaural Audio Signal Processing Using Interaural Coherence Matching
Binaural room impulse responses (BRIRs) characterize the transfer of sound from a source in a room to the left and right ear entrances of a listener. Applying BRIRs to sound source signals enables headphone listening with the perception of a three dimensional auditory image. BRIRs are usually linear filters of several hundred milliseconds to several seconds length. The waveforms of the BRIRs contain therefore a vast amount of information. This thesis studies the modeling of BRIRs with a reduced set of parameters. It is shown that late BRIR tails can be modeled perceptually accurately by considering only the time-frequency energy decay relief and frequency dependent interaural coherence (IC). This insight on BRIR modeling enables a number of algorithms with advantages over the previous state of the art. Three such algorithms are proposed: The first algorithm makes it possible to obtain BRIRs by measuring room properties and listener properties separately, vastly reducing the number of measurements necessary to measure listener-specific BRIRs for a number of listeners and rooms. The listener properties are measured as a head related transfer function (HRTF) set and the room properties are measured as a B-format1 room impulse response (RIR). It is shown how to combine the HRTF set of the listener with a B-format RIR to obtain BRIRs for that room individualized for the listener. This technique uses the insight on BRIR perception by computing the BRIR tail as a frequency dependent, linear combination of B-format channels, designed to obtain the desired energy decay relief and interaural coherence. A serious problem related to convolving sound source signals with BRIRs is the computational complexity of implementing long BRIRs as finite impulse response (FIR) filters. Inspired by the perceptual experiments on BRIR tails, a modified Jot reverberator is proposed, simulating BRIR tails with the desired frequency dependent interaural coherence, requiring significantly less computational power than direct application of BRIRs. Also inspired by the perception of BRIRs, an extension of this reverberator is proposed, modeling efficiently the reverberation tail with the correct coherence and also distinct early reflections using two parallel feedback delay networks. If stereo signals are played back using headphones, unnatural binaural cues are given to the listener, e.g. interaural level difference (ILD) changes not accompanied by corresponding interaural time difference (ITD) changes or diffuse sound with unnatural IC. In order to simulate stereo listening in a room and to avoid these unnatural cues, BRIRs can be applied to the left and right stereo channels. Besides the computational complexity associated with applying the BRIR filters, this technique has a number of disadvantages. The room associated with the used BRIRs is imposed on the stereo signal, which usually already contains reverberation and applying BRIRs leads to a change in reverberation time and to coloration. A technique is proposed in which the direct sound is rendered using data extracted from HRTFs and the ambient sound contained in the stereo signal is modified such that its coherence is matched to the coherence of a binaural recording of diffuse sound, without modifying its spectrum. Implementations of reverberators based on general feedback-delay networks (e.g. Jot reverberators) can require a high number of operations for implementing the so-called feedback matrix. For certain applications where the number of channels needs to be high, such as decorrelators, this can pose a real problem. Special types of matrices are known which can be implemented efficiently due to matrix elements having the same magnitude. However, the complexity can also be reduced by introducing many zero elements. Different types of such sparse feedback matrices are proposed and tested for their suitability in Jot reverberators. A highly efficient feedback matrix is obtained by combining both approaches, choosing the nonzero elements of a sparse matrix from efficiently implementable Hadamard matrices. ______________________________ 1 B-format refers to a 4-channel signal recorded with four coincident microphones: one omni and three dipole microphones pointing in orthogonal directions.
EPFL_TH4643.pdf
openaccess
4.12 MB
Adobe PDF
803be111063b75a892b4d4a8c09735b4