A multimodal approach to extract optimized audio features for speaker detection

Besson, P.; Kunt, M.; Butz, T.; Thiran, J.

conference paper

A multimodal approach to extract optimized audio features for speaker detection

•

•

2005

Proceedings of European Signal Processing Conference (EUSIPCO)

We present a method that exploits an information theoretic framework to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the active speaker among different candidates. The results show that our method is able to exploit the shared speech information contained in audio and video signals to recover their common source.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/231696

Name

Besson2005_1314.pdf

Access type

openaccess

Size

91.98 KB

Format

Adobe PDF

Checksum (MD5)

77d6f63d59d69e26fad3b19efad3a6ab