Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Reports, Documentation, and Standards
  4. Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array
 
report

Speech Enhancement and Recognition in Meetings with an Audio-Visual Sensor Array

Maganti, Hari Krishna
•
Gatica-Perez, Daniel  
•
McCowan, Iain A.
2006

We address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. Beamforming techniques rely on the knowledge of a speaker location. In this paper, we present an integrated approach, in which an audio-visual multi-person tracker is used to track active speakers with high accuracy. Speech enhancement is then achieved using microphone array beamforming followed by a novel post-filtering stage. Finally, speech recognition is performed to evaluate the quality of the enhanced speech signal. The approach is evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results show that the speech enhancement and recognition performance, achieved using our approach are significantly better than single table-top microphone and comparable to lapel microphone for all the scenarios. The results also indicate that the audio-visual based system performs significantly better than audio-only system, both in terms of enhancement and recognition. This reveals that the accurate speaker tracking, provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array based speech recognition system.

  • Files
  • Details
  • Metrics
Type
report
Author(s)
Maganti, Hari Krishna
Gatica-Perez, Daniel  
McCowan, Iain A.
Date Issued

2006

Publisher

IDIAP

Subjects

speech

URL

URL

http://publications.idiap.ch/downloads/reports/2006/rr06-24.pdf
Written at

EPFL

EPFL units
LIDIAP  
Available on Infoscience
June 8, 2006
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/230370
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés