Towards Audio-Visual On-line Diarization Of Participants In Group Meetings

Hung, Hayley; Friedland, Gerald

conference paper not in proceedings

Hung, Hayley

•

Friedland, Gerald

2008

European Conference on Computer Vision Workshop on Multi-camera and Multi-modal Sensor Fusion

We propose a fully automated, unsupervised, and non-int-rusive method of identifying the current speaker audio-vis-ually in a group conversation. This is achieved without specialized hardware, user interaction, or prior assignment of microphones to participants. Speakers are identified acoustically using a novel on-line speaker diarization approach. The output is then used to find the corresponding person in a four-camera video stream by approximating individual activity with computationally efficient features. We present results showing the robustness of the association on over 4.5 hours of non-scripted audio-visual meeting data.

Name

Hung_ECCVM2SFA2_2008.pdf

Access type

openaccess

Size

712.55 KB

Format

Adobe PDF

Checksum (MD5)

d3180d7a12a6a5367930c0935aa5e859