A 3-D Audio-Visual Corpus of Affective Communication
Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.
thumb_jgall_avcorpus_mm10.png
Thumbnail
openaccess
copyright
109.04 KB
PNG
d75708832b0d8d74766c18d9d3b7f0fb
jgall_avcorpus_mm10.pdf
postprint
openaccess
copyright
1.33 MB
Adobe PDF
5c8381800693751e350e10837106640a