Enabling Speech Applications using Ad Hoc Microphone Arrays

Microphone arrays are central players in hands-free speech interface applications. The main duty of a microphone array is capturing distant-talking speech with high quality. A microphone array can acquire the desired speech signals selectively by leading the beampattern towards the desired speaker. The foreseen application of ubiquitous sensing motivated by the abundance of microphone-embedded devices, such as notebooks and smart phones, raises the importance of research on ad hoc microphone arrays. The key challenges pertain to the unknown geometry of the microphones and asynchronous recordings. The goal of this PhD thesis is to address the issues of microphone and source localization to enable beamforming for higher level speech processing tasks. To that end, we exploit the prior knowledge of the acoustical and geometrical structures underlying the ad hoc distributed nodes to devise novel algorithms for microphone array calibration and source localization, as well as beamforming techniques for distant speech applications. To address the problem of ad hoc microphone array calibration, the analytic diffuse sound field coherence model is investigated and its fundamental properties are studied. This model enables pairwise distance estimation for calibration of a relatively compact microphone array. We derive the mathematical framework for estimation of long pairwise distances exploiting the low-rank properties of the Euclidean distance matrix and develop a novel matrix completion algorithm for ad hoc microphone array calibration along with theoretical guarantees. Furthermore, the problem of source localization using ad hoc microphones in a reverberant enclosure is addressed. We incorporate the image model of multipath propagation for construction of a Euclidean distance matrix. The low-rank structure of the distance matrix is exploited to identify the support of the room impulse response function and its unique map to the source location. This approach enables single-channel and distributed source localization from asynchronous recordings provided by ad hoc microphones. Along this line, we address the problem of robust microphone array placement to optimize the localization performance. Finally, spatial filtering techniques relying on beamforming are investigated for high quality speech acquisition and higher level applications. We develop beamformers for joint multi-speaker localization and voice activity detection. In addition, the broadband beampattern of a microphone array is characterized and its relation to predict the speech recognition accuracy is desired.

Related material