000192303 001__ 192303
000192303 005__ 20190416220311.0
000192303 037__ $$aCONF
000192303 245__ $$aFinding Audio-Visual Events in Informal Social Gatherings
000192303 269__ $$a2011
000192303 260__ $$c2011
000192303 336__ $$aConference Papers
000192303 500__ $$aOustanding paper award
000192303 520__ $$aIn this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to supervise the clustering process. This is made possible by mapping both modalities into the same metric space. To this end, we fully exploit the geometric and physical properties of an audio-visual sensor based on binocular vision and binaural hearing. We propose an EM algorithm that is theoretically well justified, intuitive, and extremely efficient from a computational point of view. This efficiency makes the method implementable on advanced platforms such as humanoid robots. We describe in detail tests and experiments performed with publicly available data sets that yield very interesting results.
000192303 6531_ $$a3D localization
000192303 6531_ $$aaudio-visual fusion
000192303 6531_ $$aevent detection
000192303 6531_ $$ascene analysis
000192303 700__ $$aAlameda-Pineda, Xavier
000192303 700__ $$aKhalidov, Vasil
000192303 700__ $$aHoraud, Radu
000192303 700__ $$aForbes, Florence
000192303 7112_ $$aIEEE/ACM 13th International Conference on Multimodal Interaction
000192303 8564_ $$s15852070$$uhttps://infoscience.epfl.ch/record/192303/files/Alameda-Pineda_ICMI_2011.pdf$$yn/a$$zn/a
000192303 909C0 $$0252189$$pLIDIAP$$xU10381
000192303 909CO $$ooai:infoscience.tind.io:192303$$pconf$$pSTI$$qGLOBAL_SET
000192303 937__ $$aEPFL-CONF-192303
000192303 970__ $$aAlameda-Pineda_ICMI_2011/LIDIAP
000192303 973__ $$aEPFL
000192303 980__ $$aCONF