Action Filename Description Size Access License Resource Version
Show more files...


The prototypical signal processing pipeline can be divided into four blocks. Representation of the signal in a basis suitable for processing. Enhancement of the meaningful part of the signal and noise reduction. Estimation of important statistical properties of the signal. Adaptive processing to track and adapt to changes in the signal statistics. This thesis revisits each of these blocks and proposes new algorithms, borrowing ideas from information theory, theoretical computer science, or communications. First, we revisit the Walsh-Hadamard transform (WHT) for the case of a signal sparse in the transformed domain, namely that has only K ≀ N non-zero coefficients. We show that an efficient algorithm exists that can compute these coefficients in O(K log2(K) log2(N/K)) and using only O(K log2(N/K)) samples. This algorithm relies on a fast hashing procedure that computes small linear combinations of transformed domain coefficients. A bipartite graph is formed with linear combinations on one side, and non-zero coefficients on the other. A peeling decoder is then used to recover the non-zero coefficients one by one. A detailed analysis of the algorithm based on error correcting codes over the binary erasure channel is given. The second chapter is about beamforming. Inspired by the rake receiver from wireless communications, we recognize that echoes in a room are an important source of extra signal diversity. We extend several classic beamforming algorithms to take advantage of echoes and also propose new optimal formulations. We explore formulations both in time and frequency domains. We show theoretically and in numerical simulations that the signal-to-interference-and-noise ratio increases proportionally to the number of echoes used. Finally, beyond objective measures, we show that echoes also directly improve speech intelligibility as measured by the perceptual evaluation of speech quality (PESQ) metric. Next, we attack the problem of direction of arrival of acoustic sources, to which we apply a robust finite rate of innovation reconstruction framework. FRIDA — the resulting algorithm — exploits wideband information coherently, works at very low signal-to-noise ratio, and can resolve very close sources. The algorithm can use either raw microphone signals or their cross- correlations. While the former lets us work with correlated sources, the latter creates a quadratic number of measurements that allows to locate many sources with few microphones. Thorough experiments on simulated and recorded data shows that FRIDA compares favorably with the state-of-the-art. We continue by revisiting the classic recursive least squares (RLS) adaptive filter with ideas borrowed from recent results on sketching least squares problems. The exact update of RLS is replaced by a few steps of conjugate gradient descent. We propose then two different precondi- tioners, obtained by sketching the data, to accelerate the convergence of the gradient descent. Experiments on artificial as well as natural signals show that the proposed algorithm has a performance very close to that of RLS at a lower computational burden. The fifth and final chapter is dedicated to the software and hardware tools developed for this thesis. We describe the pyroomacoustics Python package that contains routines for the evaluation of audio processing algorithms and reference implementations of popular algorithms. We then give an overview of the microphone arrays developed.