Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Speech/Non-Speech Detection in Meetings from Automatically Extracted Low Resolution Visual Features
 
report

Speech/Non-Speech Detection in Meetings from Automatically Extracted Low Resolution Visual Features

Hung, Hayley
•
Ba, Silèye O.  
2009

In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues from group meetings. Traditionally, the task of speech/non-speech detection or speaker diarization tries to find who speaks and when from audio features only. Recent work has addressed the problem audio-visually but often with less emphasis on the visual component. Due to the high probability of losing the audio stream during video conferences, this work proposes methods for estimating speech using just low resolution visual cues. We carry out experiments to compare how context through the observation of group behaviour and task-oriented activities can help improve estimates of speaking status. We test on 105 minutes of natural meeting data with unconstrained conversations.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

Hung_Idiap-RR-20-2009.pdf

Access type

openaccess

Size

559.39 KB

Format

Adobe PDF

Checksum (MD5)

16799bb2e79691af71fce12862d18708

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés