Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Multimodal Speaker Localization in a Probabilistic Framework
 
conference paper

Multimodal Speaker Localization in a Probabilistic Framework

Gurban, M.
•
Thiran, J.  
2006
14th European Signal Processing Conference (EUSIPCO), Florence, Italy, September 2006

A multimodal probabilistic framework is proposed for the problem of finding the active speaker in a video sequence. We localize the current speaker's mouth in the image by using the video and the audio channels together. We propose a novel visual feature that is well-suited for the analysis of the movement of the mouth. After estimating the joint probability density of the audio and visual features, we can find the most probable location of the current speaker's mouth in a sequence of images. The proposed method is tested on the CUAVE audio-visual database, yielding improved results, compared to other approaches from the literature.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Gurban2006_1490.pdf

Access type

openaccess

Size

253.71 KB

Format

Adobe PDF

Checksum (MD5)

d79a9eef3927b320bba15ab339da8c47

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés