Multimodal Speaker Localization in a Probabilistic Framework

Gurban, M.; Thiran, J.

Gurban, M.; Thiran, J.

2006

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature that is well-suited for the analysis of the movement of the mouth. After estimating the joint probability density of the audio and visual features, we can find the most probable location of the current speaker's mouth in a sequence of images. The proposed method is tested on the CUAVE audio-visual database, yielding improved results, compared to other approaches from the literature.

Détails

Titre Multimodal Speaker Localization in a Probabilistic Framework

Auteur(s) Gurban, M. ; Thiran, J.

Publié dans 14th European Signal Processing Conference (EUSIPCO), Florence, Italy, September 2006

Série Parallel Computing in Electrical Engineering

Date 2006

Editeur IEEE

Mots-clés (libres)

LTS5

Laboratoires LTS5

Le document apparaît dans Production scientifique et compétences > STI - Faculté des sciences et techniques de l'ingénieur > IEM - Institute of Electrical and Micro Engineering > LTS5 - Laboratoire de traitement des signaux 5
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL
Publié

Date de création de la notice 2006-10-27

Files

Résumé

Détails

PDF