Multi-pose lipreading and audio-visual speech recognition

Estellers Casas, Virginia; Thiran, Jean-Philippe

doi:10.1186/1687-6180-2012-51

Estellers Casas, Virginia; Thiran, Jean-Philippe

2012

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization block in a standard system and generate virtual frontal views from non-frontal images. The proposed method is inspired by pose-invariant face recognition and relies on linear regression to find an approximate mapping between images from different poses. We integrate the proposed pose normalization block at different stages of the speech recognition system and quantify the loss of performance related to pose changes and pose normalization techniques. In audio-visual experiments we also analyze the integration of the audio and visual streams. We show that an audio-visual system should account for non-frontal poses and normalization techniques in terms of the weight assigned to the visual stream in the classifier.

Details

Title Multi-pose lipreading and audio-visual speech recognition

Author(s) Estellers Casas, Virginia ; Thiran, Jean-Philippe

Published in EURASIP Journal on Advances in Signal Processing

Volume 51

Pages 1-23

Date 2012

Keywords

LTS5, lipreading

Language English

DOI https://doi.org/10.1186/1687-6180-2012-51

Other identifier(s) View record in Web of Science

Laboratories LTS5

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS5 - Signal Processing Laboratory 5
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2012-02-19

Actions

Preview

Select file: