Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Multimodal Multispeaker Probabilistic Tracking in Meetings
 
conference paper

Multimodal Multispeaker Probabilistic Tracking in Meetings

Gatica-Perez, Daniel  
•
Lathoud, Guillaume  
•
Odobez, Jean-Marc  
Show more
2005
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
Int. Conf. on Multimodal Interfaces (ICMI)

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.

  • Files
  • Details
  • Metrics
Type
conference paper
DOI
10.1145/1088463.1088496
Author(s)
Gatica-Perez, Daniel  
Lathoud, Guillaume  
Odobez, Jean-Marc  
McCowan, Iain A.
Date Issued

2005

Published in
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
Start page

183

End page

190

Subjects

speech

•

vision

URL

URL

http://publications.idiap.ch/downloads/reports/2004/rr-04-66.pdf

Related documents

http://publications.idiap.ch/index.php/publications/showcite/gatica05c
Written at

EPFL

EPFL units
LIDIAP  
Event name
Int. Conf. on Multimodal Interfaces (ICMI)
Available on Infoscience
March 10, 2006
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/228774
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés