Unsupervised Location-Based Segmentation of Multi-Party Speech

Lathoud, Guillaume; McCowan, Iain A.; Odobez, Jean-Marc

conference paper

Lathoud, Guillaume

•

McCowan, Iain A.

•

Odobez, Jean-Marc

2004

Proceedings of the 2004 ICASSP-NIST Meeting Recognition Workshop

Accurate detection and segmentation of spontaneous multi-party speech is crucial for a variety of applications, including speech acquisition and recognition, as well as higher-level event recognition. However, the highly sporadic nature of spontaneous speech makes this task difficult. Moreover, multi-party speech contains many overlaps. We propose to attack this problem as a tracking task, using location cues only. In order to best deal with high sporadicity, we propose a novel, generic, short-term clustering algorithm that can track multiple objects for a low computational cost. The proposed approach is online, fully deterministic and can run in real-time. In an application to real meeting data, the algorithm produces high precision speech segmentation.

Name

lathoud04a.pdf

Access type

openaccess

Size

1.15 MB

Format

Adobe PDF

Checksum (MD5)

d6f45d2c4c7e5707256b6ff5557dd2ed