Low-Dimensional Motion Features for Audio-Visual Speech Recognition

Valles, A.; Gurban, M.; Thiran, Jean-Philippe

Valles, A.; Gurban, M.; Thiran, Jean-Philippe

2007

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Audio-visual speech recognition promises to improve the performance of speech recognizers, especially when the audio is corrupted, by adding information from the visual modality, more specifically, from the video of the speaker. However, the number of visual features that are added is typically bigger than the number of audio features, for a small gain in accuracy. We present a method that shows gains in performance comparable to the commonly-used DCT features, while employing a much smaller number of visual features based on the motion of the speaker’s mouth. Motion vector differences are used to compensate for errors in the mouth tracking. This leads to a good performance even with as few as 3 features. The advantage of low-dimensional features is that a good accuracy can be obtained with relatively little training data, while also increasing the speed of both training and testing.

Details

Title Low-Dimensional Motion Features for Audio-Visual Speech Recognition

Author(s) Valles, A. ; Gurban, M. ; Thiran, Jean-Philippe

Published in 15th European Signal Processing Conference (EUSIPCO)

Conference 15th European Signal Processing Conference (EUSIPCO), Poznan, Poland, September, 3-7, 2007

Date 2007

Publisher Poznan, Poland

Keywords

LTS5

Additional link URL

Laboratories LTS5

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS5 - Signal Processing Laboratory 5
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2007-07-18

Files

Abstract

Details

PDF