Abstract

Overlapping speech is a source of significant errors in speaker diarization of spontaneous meeting recordings. Recent works on speaker diarization have attempted to solve the problem of overlap detection using classifiers trained on acoustic and spatial features. This paper proposes a method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics. The statistics are obtained at segment level(around few seconds) from the output of a diarization system. The approach is motivated by the observation that segments containing more speaker changes are more probable to have more overlaps. Experiments on AMI meeting corpus reveal that the number of overlaps in a segment follows a Poisson distribution whose rate is directly proportional to the number of speaker changes in the segment. When this information is combined with acoustic information in an HMM/GMM overlap detector, improvements are verified in terms of F-measure and consequently, diarization error (DER) is reduced by 5% relative to the baseline overlap detector.

Details

Actions