Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Spatio-temporal analysis of spontaneous speech with microphone arrays
 
doctoral thesis

Spatio-temporal analysis of spontaneous speech with microphone arrays

Lathoud, Guillaume  
2007

Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multi-party conversations, this thesis focuses on the use of microphone arrays to address the question "Who spoke where and when?". The speed, the versatility and the robustness of the proposed techniques are tested on a variety of real indoor recordings, including multiple moving speakers as well as seated speakers in meetings. Optimized implementations are provided in most cases. We propose to discretize the physical space into a few sectors, and for each time frame, to determine which sectors contain active acoustic sources ("Where? When?"). A topological interpretation of beamforming is proposed, which permits both to evaluate the average acoustic energy in a sector for a negligible cost, and to locate precisely a speaker within an active sector. One additional contribution that goes beyond the field of microphone arrays is a generic, automatic threshold selection method, which does not require any training data. On the speaker detection task, the new approach is dramatically superior to the more classical approach where a threshold is set on training data. We use the new approach into an integrated system for multispeaker detection-localization. Another generic contribution is a principled, threshold-free, framework for short-term clustering of multispeaker location estimates, which also permits to detect where and when multiple trajectories intersect. On multi-party meeting recordings, using distant microphones only, short-term clustering yields a speaker segmentation performance similar to that of close-talking microphones. The resulting short speech segments are then grouped into speaker clusters ("Who?"), through an extension of the Bayesian Information Criterion to merge multiple modalities. On meeting recordings, the speaker clustering performance is significantly improved by merging the classical mel-cepstrum information with the short-term speaker location information. Finally, a close analysis of the speaker clustering results suggests that future research should investigate the effect of human acoustic radiation characteristics on the overall transmission channel, when a speaker is a few meters away from a microphone.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-3689
Author(s)
Lathoud, Guillaume  
Advisors
Bourlard, Hervé  
•
Odobez, Jean-Marc  
Jury

Christof Faller, Rainer Martin, Steve Renals

Date Issued

2007

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2006-12-22

Thesis number

3689

Total of pages

268

Subjects

microphone arrays

•

speaker localization, tracking, segmentation, and clustering

•

spontaneous multi-party speech processing

•

antennes de microphones

•

localisation, suivi, segmentation et groupage de locuteurs

•

traitement de la parole spontanée de plusieurs locuteurs

EPFL units
LIDIAP  
Faculty
STI  
Section
STI-SEL  
School
IEL  
Available on Infoscience
October 31, 2006
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/235413
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés