Audio-visual reliability estimates using stream entropy for speech recognition

We present a method for multimodal fusion based on the estimated reliability of each individual modality. Our method uses an information theoretic measure, the entropy derived from the state probability distribution for each stream, as an estimate of reliability. Our application is audio-visual speech recognition. The two modalities, audio and video, are weighted at each time instant according to their reliability. In this way, the weights vary dynamically and are able to adapt to any type of noise in each modality, and more importantly, to unexpected variations in the level of noise.


Year:
2009
Keywords:
Laboratories:




 Record created 2009-09-10, last modified 2018-03-17

n/a:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)