Audio-visual reliability estimates using stream entropy for speech recognition

We present a method for multimodal fusion based on the estimated reliability of each individual modality. Our method uses an information theoretic measure, the entropy derived from the state probability distribution for each stream, as an estimate of reliability. Our application is audio-visual speech recognition. The two modalities, audio and video, are weighted at each time instant according to their reliability. In this way, the weights vary dynamically and are able to adapt to any type of noise in each modality, and more importantly, to unexpected variations in the level of noise.

    Keywords: LTS5


    • EPFL-REPORT-140894

    Record created on 2009-09-10, modified on 2017-05-10


Related material