Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array
 
report

Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array

Maganti, Hari Krishna
•
Gatica-Perez, Daniel  
•
McCowan, Iain A.
2006

We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. Beamforming techniques rely on the knowledge of a speaker location. In this paper, we present an integrated approach, in which an audio-visual multi-person tracker is used to track active speakers with high accuracy. Speech enhancement is then achieved using microphone array beamforming followed by a novel post-filtering stage. Finally, speech recognition is performed to evaluate the quality of the enhanced speech signal. The approach is evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results show that the speech enhancement and recognition performance, achieved using our approach are significantly better than single table-top microphone and comparable to lapel microphone for all the scenarios. The results also indicate that the audio-visual based system performs significantly better than audio-only system, both in terms of enhancement and recognition. This reveals that the accurate speaker tracking, provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array based speech recognition system.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

rr06-24.pdf

Access type

openaccess

Size

1.12 MB

Format

Adobe PDF

Checksum (MD5)

4c436338d4569cb6fc546657ac6ef71e

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés