Two-level bimodal association for audio-visual speech recognition

Lee, Jong-Seok; Ebrahimi, Touradj

doi:10.1007/978-3-642-04697-1_13

conference paper

Two-level bimodal association for audio-visual speech recognition

Lee, Jong-Seok

•

Ebrahimi, Touradj

Blanc-Talon, J.

2009

Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09)

International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09)

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

Type

conference paper

DOI

10.1007/978-3-642-04697-1_13

Web of Science ID

WOS:000279102300013

Author(s)

Lee, Jong-Seok

Ebrahimi, Touradj

Editors

Blanc-Talon, J.

Date Issued

2009

Publisher

Springer-Verlag

Published in

Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09)

Series title/Series vol.

Lecture Notes in Computer Science; 5807

Start page

133

End page

144

Subjects

audio-visual speech recognition

•

synchronization

•

cross-modal correlation

•

canonical correlation analysis

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

GR-EB

Event name	Event place
International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS’09)	Bordeaux, France

Available on Infoscience

July 2, 2009

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/41020