Selecting relevant visual features for speechreading

Estellers, Virginia; Gurban, Mihai; Thiran, Jean-Philippe

doi:10.1109/ICIP.2009.5414563

Estellers, Virginia; Gurban, Mihai; Thiran, Jean-Philippe

2009

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

A quantitative measure of relevance is proposed for the task of constructing visual feature sets which are at the same time relevant and compact. A feature's relevance is given by the amount of information that it contains about the problem, while compactness is achieved by preventing the replication of information between features in the set. To achieve these goals, we use mutual information both for assessing relevance and measuring the redundancy between features. Our application is speechreading, that is, speech recognition performed on the video of the speaker. This is justified by the fact that the performance of audio speech recognition can be improved by augmenting the audio features with visual ones, especially when there is noise in the audio channel. We report significant improvements compared to the most commonly used method of dimensionality reduction for speechreading, linear discriminant analysis.

Details

Title Selecting relevant visual features for speechreading

Author(s) Estellers, Virginia ; Gurban, Mihai ; Thiran, Jean-Philippe

Published in Proceedings of the IEEE International Conference on Image Processing

Pages 1433-1436

Conference IEEE International Conference on Image Processing, Cairo, November 7-11, 2009

Date 2009

Publisher Cairo, Egypt

Keywords

LTS5; Feature extraction; image processing; speech recognition

DOI https://doi.org/10.1109/ICIP.2009.5414563

Other identifier(s) View record in Web of Science

Additional link URL

Laboratories LTS5

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LTS5 - Signal Processing Laboratory 5
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2009-05-29

Files

Abstract

Details

PDF