AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking

Lathoud, Guillaume; Odobez, Jean-Marc; Gatica-Perez, Daniel

report

Lathoud, Guillaume

•

Odobez, Jean-Marc

•

Gatica-Perez, Daniel

2004

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. 16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.

Name

rr-04-28.pdf

Access type

openaccess

Size

1.2 MB

Format

Adobe PDF

Checksum (MD5)

fc9ab53124f7d1e0b6969e98a69472b7