Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Multi-Modal Recurrent Attention Networks for Facial Expression Recognition
 
research article

Multi-Modal Recurrent Attention Networks for Facial Expression Recognition

Lee, Jiyoung
•
Kim, Sunok
•
Kim, Seungryong  
Show more
January 1, 2020
Ieee Transactions On Image Processing

Recent deep neural networks based methods have achieved state-of-the-art performance on various facial expression recognition tasks. Despite such progress, previous researches for facial expression recognition have mainly focused on analyzing color recording videos only. However, the complex emotions expressed by people with different skin colors under different lighting conditions through dynamic facial expressions can be fully understandable by integrating information from multi-modal videos. We present a novel method to estimate dimensional emotion states, where color, depth, and thermal recording videos are used as a multi-modal input. Our networks, called multi-modal recurrent attention networks (MRAN), learn spatiotemporal attention volumes to robustly recognize the facial expression based on attention-boosted feature volumes. We leverage the depth and thermal sequences as guidance priors for color sequence to selectively focus on emotional discriminative regions. We also introduce a novel benchmark for multi-modal facial expression recognition, termed as multi-modal arousal-valence facial expression recognition (MAVFER), which consists of color, depth, and thermal recording videos with corresponding continuous arousal-valence scores. The experimental results show that our method can achieve the state-of-the-art results in dimensional facial expression recognition on color recording datasets including RECOLA, SEWA and AFEW, and a multi-modal recording dataset including MAVFER.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TIP.2020.2996086
Web of Science ID

WOS:000546910100006

Author(s)
Lee, Jiyoung
•
Kim, Sunok
•
Kim, Seungryong  
•
Sohn, Kwanghoon
Date Issued

2020-01-01

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Published in
Ieee Transactions On Image Processing
Volume

29

Start page

6977

End page

6991

Subjects

Computer Science, Artificial Intelligence

•

Engineering, Electrical & Electronic

•

Computer Science

•

Engineering

•

face recognition

•

image color analysis

•

videos

•

emotion recognition

•

benchmark testing

•

databases

•

task analysis

•

multi-modal facial expression recognition

•

dimensional (continuous) emotion recognition

•

attention mechanism

•

database

•

emotion

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
CVLAB  
Available on Infoscience
July 23, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/170305
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés