Journal article

Modified group delay feature based total variability space modelling for speaker recognition

In this paper, modified group delay (MODGD) features are used to model target speakers in the Total Variability Space (TVS) framework for speaker recognition. MODGD based features have been shown to improve speaker recognition performance owing to the ability of group delay functions to emphasise formants. The basis vectors of TVS are estimated using the PPCA algorithm while i-vectors for a speaker are extracted using the conventional technique. The estimation of the total variability space is simplified by a simple transformation of the supervectors. This results in a significant speed up in the estimation of hyperparameters of TVS as the computational complexity of PPCA algorithm is simpler compared to that of the conventaional procedure. This is important as the estimation procedure needs to handle large amounts data for estimation. The technique has already been shown to provide a speed up of 16×. The performance of the MODGD-based system is compared with that of the MFCC based system on the NIST SRE 2010 benchmark dataset. Two types of fusions are tested in this work—systems fused at the i-vector level and at the score level. A considerable performance improvement is observed in terms of the EER (Equal Error Rate) by employing these fusion techniques. A robust speaker recognition system with decreased development time is obtained as a result.


Related material