Multimodal Speaker Localization in a Probabilistic Framework

A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature that is well-suited for the analysis of the movement of the mouth. After estimating the joint probability density of the audio and visual features, we can find the most probable location of the current speaker's mouth in a sequence of images. The proposed method is tested on the CUAVE audio-visual database, yielding improved results, compared to other approaches from the literature.


Published in:
14th European Signal Processing Conference (EUSIPCO), Florence, Italy, September 2006
Year:
2006
Publisher:
IEEE
Keywords:
Laboratories:




 Record created 2006-10-27, last modified 2018-03-17

n/a:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)