An Investigation of Spectral Subband Centroids for Speaker Authentication
Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, in the form of cepstrums, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to the formant frequency but are limited to a given subband. Preliminary empirical findings, on a subset of the XM2VTS database, using Analysis of Variance and Linear Discriminant Analysis suggest that, firstly, a certain number of centroids (up to about 16) are necessary to cover enough information about the speaker's identity; and secondly, that SSCs could provide complementary information to the conventional MFCCs. Theoretical findings suggest that mean-subtracted SSCs are more robust to additive noise. Further empirical experiments carried out on the more realistic NIST2001 database using SSCs, MFCCs (respectively LFCCs) and their combinations by concatenation suggest that SSCs are indeed robust and complementary features to conventional MFCC (respectively LFCCs) features often used in speaker authentication.
rr03-62.pdf
openaccess
292.43 KB
Adobe PDF
e9ff645cf610e47c3ecd822c11e009ac