Learning How to Smile: Expression Video Generation With Conditional Adversarial Recurrent Nets

Wang, Wei; Alameda-Pineda, Xavier; Xu, Dan; Ricci, Elisa; Sebe, Nicu

doi:10.1109/TMM.2019.2963621

Wang, Wei; Alameda-Pineda, Xavier; Xu, Dan; Ricci, Elisa; Sebe, Nicu

2020

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

While several research studies have focused on analyzing human behavior and, in particular, emotional signals from visual data, the problem of synthesizing face video sequences with specific attributes (e.g. age, facial expressions) received much less attention. This paper proposes a novel deep generative model able to produce face videos from a given image of a neutral face and a label indicating a specific facial expression, e.g. spontaneous smile. Our framework consists of two main building blocks: an image generator and a frame sequence generator. The image generator is implemented as a deep neural model which combines generative adversarial networks and variational auto-encoders, while the sequence generator is a label-conditioned recurrent neural network. In the proposed framework, given as input a neural face and a label, the sequence generator outputs a set of hidden representations with smooth transitions corresponding to video frames. Then, the image generator is used to decode the hidden representations into the actual face images. To impose that the net generates videos consistent with the given label, a novel identity adversarial loss is proposed. Our experimental results demonstrate the effectiveness of the framework and the advantage of introducing an adversarial component into recurrent models for face video generation.

Details

Title Learning How to Smile: Expression Video Generation With Conditional Adversarial Recurrent Nets

Author(s) Wang, Wei ; Alameda-Pineda, Xavier ; Xu, Dan ; Ricci, Elisa ; Sebe, Nicu

Published in IEEE Transactions On Multimedia

Volume 22

Issue 11

Pages 2808-2819

Date 2020-11-01

ISSN 1520-9210
1941-0077

Keywords

face; generators; generative adversarial networks; manifolds; solid modeling; three-dimensional displays; visualization; video generation; gated recurrent unit; smile

DOI https://doi.org/10.1109/TMM.2019.2963621

Other identifier(s) View record in Web of Science

Laboratories CVLAB

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > CVLAB - Computer Vision Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2020-12-16

Abstract

Details

Actions