Fanelli, GabrieleGall, JuergenRomsdorfer, HaraldWeise, ThibautVan Gool, Luc2011-12-162011-12-162011-12-16201010.1109/TMM.2010.2052239https://infoscience.epfl.ch/handle/20.500.14299/75064WOS:000283291900012Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.Audio-visual databaseemotional speechface trackingvisual speech modeling3-D face modelingEmotionSpeechRecognitionDatabasesInductionStatesMoodA 3-D Audio-Visual Corpus of Affective Communicationtext::journal::journal article::research article