Mutual information eigenlips for audio-visual speech recognition

This paper proposes an application of information theoretic approach for finding the most informative subset of eigenfeatures to be used for audio-visual speech recognition tasks. The state-of-the-art visual feature extraction methods in the area of speechreading rely on either pixel or geometric based methods or their combination. However, there is no common rule defining how these features have to be selected with respect to the chosen set of audio cues and how well they represent the classes of the uttered speech. Our main objective is to exploit the complementarity of audio and visual sources and select meaningful visual descriptors by the means of mutual information. We focus on the principal components projections of the mouth region images and apply the proposed method such that only those cues having the highest mutual information with word classes are retained. The algorithm is tested by performing various speech recognition experiments on a chosen audio-visual dataset. The obtained recognition rates are compared to those acquired using a conventional principal component analysis and promising results are shown.


Published in:
14th European Signal Processing Conference (EUSIPCO)
Year:
2006
Publisher:
IEEE
Keywords:
Laboratories:




 Record created 2006-10-27, last modified 2018-03-17

n/a:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)