INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION
The aim of the domain-adaptation task for speaker verification is to exploit unlabelled target domain data by using the labelled source domain data effectively. The i-vector based Probabilistic Linear Dis- criminant Analysis (PLDA) framework approaches this task by clus- tering the target domain data and using each cluster as a unique speaker to estimate PLDA model parameters. These parameters are then combined with the PLDA parameters from the source domain. Typically, agglomerative clustering with cosine distance measure is used. In tasks such as speaker diarization that also require unsuper- vised clustering of speakers, information-theoretic clustering mea- sures have been shown to be effective. In this paper, we employ the Information Bottleneck (IB) clustering technique to find speaker clusters in the target domain data. This is achieved by optimizing the IB criterion that minimizes the information loss during the cluster- ing process. The greedy optimization of the IB criterion involves ag- glomerative clustering using the Jensen-Shannon divergence as the distance metric. Our experiments in the domain-adaptation task in- dicate that the proposed system outperforms the baseline by about 14% relative in terms of equal error rate.
Record created on 2016-04-19, modified on 2016-08-09