Investigating the use of Visual Focus of Attention for Audio-Visual Speaker Diarisation

Garau, Giulia; Ba, Silèye O.; Bourlard, Hervé; Odobez, Jean-Marc

doi:10.1145/1631272.1631387

conference paper

Investigating the use of Visual Focus of Attention for Audio-Visual Speaker Diarisation

Garau, Giulia

•

Ba, Silèye O.

•

Bourlard, Hervé

more

2009

MM '09: Proceedings of the 17th ACM international conference on Multimedia

ACM International Conference on Multimedia

Audio-visual speaker diarisation is the task of estimating ``who spoke when'' using audio and visual cues. In this paper we propose the combination of an audio diarisation system with psychology inspired visual features, reporting experiments on multiparty meetings, a challenging domain characterised by unconstrained interaction and participant movements. More precisely the role of gaze in coordinating speaker turns was exploited by the use of Visual Focus of Attention features. Experiments were performed both with the reference and 3 automatic VFoA estimation systems, based on head pose and visual activity cues, of increasing complexity. VFoA features yielded consistent speaker diarisation improvements in combination with audio features using a multi-stream approach.

Name

Garau_ACMMULTIMEDIA_2009.pdf

Access type

openaccess

Size

413.57 KB

Format

Adobe PDF

Checksum (MD5)

73159fef35dc3a5e7a7b2d338363562b