report
Audio-visual reliability estimates using stream entropy for speech recognition
2009
We present a method for multimodal fusion based on the estimated reliability of each individual modality. Our method uses an information theoretic measure, the entropy derived from the state probability distribution for each stream, as an estimate of reliability. Our application is audio-visual speech recognition. The two modalities, audio and video, are weighted at each time instant according to their reliability. In this way, the weights vary dynamically and are able to adapt to any type of noise in each modality, and more importantly, to unexpected variations in the level of noise.
Type
report
Author(s)
Gurban, Mihai
Date Issued
2009
Subjects
Written at
EPFL
EPFL units
Available on Infoscience
September 10, 2009
Use this identifier to reference this record