Abstract

In this paper, we apply speaker diarization strategies from a single source to the task of estimating the dominant person in a group meeting. Previous work has shown that speaking length is strongly correlated with perceived dominance. Here we investigate this in more depth by considering two dominance tasks where there is full and majority agreement amongst ground-truth annotators. In addition, we investigate how 24 different speed-up and algorithmic strategies, and source types lead to interesting outcomes when applied to dominance estimation. We obtained the best performance of $77\%$ using our slowest scheme and a single distant microphone (SDM). Within the top 3 out of 24 performing experiments in both dominance tasks, we show that we can use the furthest SDM, with no prior knowledge of the number of speakers and the fastest diarization scheme, which performs $1.3$ times faster than real-time.

Details

Actions