Two-level bimodal association for audio-visual speech recognition

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.


Editor(s):
Blanc-Talon, J.
Published in:
Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09), 133-144
Presented at:
International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09), Bordeaux, France
Year:
2009
Publisher:
Springer-Verlag
Keywords:
Laboratories:




 Record created 2009-07-02, last modified 2018-01-28

External link:
Download fulltext
Fulltext
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)