000149405 001__ 149405
000149405 005__ 20180913055815.0
000149405 037__ $$aCONF
000149405 245__ $$aOvercoming Asynchrony in Audio-Visual Speech Recognition
000149405 269__ $$a2010
000149405 260__ $$c2010
000149405 336__ $$aConference Papers
000149405 520__ $$aIn this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.
000149405 6531_ $$aLTS5
000149405 700__ $$0242935$$aEstellers Casas, Virginia$$g182750
000149405 700__ $$0240323$$aThiran, Jean-Philippe$$g115534
000149405 7112_ $$aMMSP$$cSaint Malo, France$$dOctober 4-6, 2010
000149405 773__ $$tProceedings of Multimedia Signal Processing Conference
000149405 8564_ $$s166836$$uhttps://infoscience.epfl.ch/record/149405/files/1569323743.pdf$$yn/a$$zn/a
000149405 909C0 $$0252394$$pLTS5$$xU10954
000149405 909CO $$ooai:infoscience.tind.io:149405$$pconf$$pSTI
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 917Z8 $$x182750
000149405 937__ $$aEPFL-CONF-149405
000149405 973__ $$aEPFL$$rREVIEWED$$sPUBLISHED
000149405 980__ $$aCONF