Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Prediction of Asynchronous Dimensional Emotion Ratings from Audiovisual and Physiological Data
 
research article

Prediction of Asynchronous Dimensional Emotion Ratings from Audiovisual and Physiological Data

Ringeval, Fabien
•
Eyben, Florian
•
Kroupi, Eleni  
Show more
2015
Pattern Recognition Letters

Automatic emotion recognition systems based on supervised machine learning require reliable annotation of affective behaviours to build useful models. Whereas the dimensional approach is getting more and more popular for rating: affective behaviours in continuous time domains, e.g., arousal and valence, methodologies to take into account reaction lags of the human raters are still rare. We therefore investigate the relevance of using machine learning algorithms able to integrate contextual information in the modelling, like long short-term memory recurrent neural networks do, to automatically predict emotion from several (asynchronous) raters in continuous time domains, i.e., arousal and valence. Evaluations are performed on the recently proposed RECOLA multimodal database (27 subjects, 5 min of data and six raters for each), which includes audio, video, and physiological (ECG, EDA) data. In fact, studies uniting audiovisual and physiological information are still very rare. Features are extracted with various window sizes for each modality and performance for the automatic emotion prediction is compared for both different architectures of neural networks and fusion approaches (feature-level/decision-level). The results show that: (i) LSTM network can deal with (asynchronous) dependencies found between continuous ratings of emotion with video data, (ii) the prediction of the emotional valence requires longer analysis window than for arousal and (iii) a decision-level fusion leads to better performance than a feature-level fusion. The best performance (concordance correlation coefficient) for the multimodal emotion prediction is 0.804 for arousal and 0.528 for valence.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.patrec.2014.11.007
Web of Science ID

WOS:000362271100004

Author(s)
Ringeval, Fabien
•
Eyben, Florian
•
Kroupi, Eleni  
•
Yuce, Anil  
•
Thiran, Jean-Philippe  
•
Ebrahimi, Touradj  
•
Lalanne, Denis
•
Schuller, Bjoern
Date Issued

2015

Publisher

Elsevier

Published in
Pattern Recognition Letters
Volume

66

Start page

22

End page

30

Subjects

Context-learning long short-term memory

•

recurrent neural networks

•

Audiovisual and physiological data

•

Continuous affect analysis

•

Multi-task learning

•

Multitime resolution features extraction

•

Multimodal fusion

Peer reviewed

NON-REVIEWED

Written at

EPFL

EPFL units
LTS5  
GR-EB  
Available on Infoscience
November 20, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/108995
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés