000207938 001__ 207938
000207938 005__ 20190619023659.0
000207938 0247_ $$2doi$$a10.5075/epfl-thesis-6542
000207938 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis6542-3
000207938 02471 $$2nebis$$a10438581
000207938 037__ $$aTHESIS
000207938 041__ $$aeng
000207938 088__ $$a6542
000207938 245__ $$aSpeaker diarization of spontaneous meeting room conversations
000207938 269__ $$a2015
000207938 260__ $$bEPFL$$c2015$$aLausanne
000207938 336__ $$aTheses
000207938 502__ $$aProf. A. Skrivervik (présidente) ; Prof. H. Bourlard (directeur) ; Dr X. Anguera,  Dr A. Stolcke,  Dr J.-M. Vesin (rapporteurs)
000207938 520__ $$aSpeaker diarization is the task of identifying ``who spoke when'' in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the systems; overlapping speech, effects of background noise and speech/nonspeech detection errors on clustering, and signficant performance variance between different systems. In this thesis we focuss on addressing these issues in diarization. We propose new features based on structure of a conversation such as silence and speaker change statistics for overlap detection. The features are estimated from a long-term context (3-4 seconds) and are used to estimate the probability of overlap at a given instant. These probabilities are later incorporated into acoustic feature based overlap detector as prior probabilities. Experiments on several meeting corpora reveal that overlap detection is improved significantly by the proposed method and this consequently reduces the diarization error. To address the issues arising from background noise, errors in speech/non-speech detection and capture speaker discriminative information in the signal, we propose two methods. In the first method, we propose Information Bottleneck with Side Information (IBSI) based diarization to supress artefacts of background noise and non-speech segments introduced into clustering. In the second method, we show that the phoneme transcript of a given recording carries useful information for speaker diarization. This obervation was used in estimation of phoneme background model which is used for diarization in Information Bottleneck (IB) framework. Both the methods achieve significant reduction in error on various meeting corpora. We train different artificial neural network (ANN) architectures to extract speaker discriminant features and use these features as input to speaker diarization systems. The ANNs are trained to perform related tasks such as speaker comparison, speaker classification and auto encoding. The bottleneck layer activations from these networks are used as features for speaker diarization. Experiments on different meeting corpora revealed that combination of MFCCs and ANN features reduces the diarization error. To address the issue of performance variations across different sytems, we propose feature level combination of HMM/GMM and IB diarization systems. The combination does not require any changes to the original systems. The output of IB system is used to generate features which when combined with MFCCs in a HMM/GMM system reduce diarization error.
000207938 6531_ $$aSpeaker diarization
000207938 6531_ $$ameeting room conversations
000207938 6531_ $$aconversational speech
000207938 6531_ $$aoverlapping speech
000207938 6531_ $$aclustering with side information
000207938 6531_ $$aphoneme background model
000207938 6531_ $$aartificial neural network features
000207938 6531_ $$ainformation bottleneck features
000207938 6531_ $$asystem combination
000207938 700__ $$0246043$$g204643$$aYella, Sree Harsha
000207938 720_2 $$aBourlard, Hervé$$edir.$$g117014$$0243348
000207938 8564_ $$uhttps://infoscience.epfl.ch/record/207938/files/EPFL_TH6542.pdf$$zn/a$$s925411$$yn/a
000207938 909C0 $$xU10381$$0252189$$pLIDIAP
000207938 909CO $$pthesis-bn2018$$pthesis-public$$pDOI$$ooai:infoscience.tind.io:207938$$qGLOBAL_SET$$pSTI$$pthesis$$qDOI2
000207938 917Z8 $$x108898
000207938 917Z8 $$x108898
000207938 918__ $$dEDEE$$cIEL$$aSTI
000207938 919__ $$aLIDIAP
000207938 920__ $$b2015$$a2015-5-26
000207938 970__ $$a6542/THESES
000207938 973__ $$sPUBLISHED$$aEPFL
000207938 980__ $$aTHESIS