Scalable Probabilistic Models for Face and Speaker Recognition

El Shafey, Laurent

doi:10.5075/epfl-thesis-6175

doctoral thesis

Scalable Probabilistic Models for Face and Speaker Recognition

2014

In the biometrics community, face and speaker recognition are mature fields in which several systems have been proposed over the past twenty years. While existing systems perform well under controlled recording conditions, mismatch caused by the use of different sensors or a lack of cooperation from the subject still significantly affects performance, especially in challenging scenarios such as in forensics. Furthermore, existing methods suffer from scalability issues, which prevents them from taking advantage of increasingly large amounts of training data. This is otherwise a promising approach to improve accuracy in such challenging scenarios. In this thesis we address these problems of mismatch and complexity by developing scalable probabilistic models that we apply to face, speaker and bimodal recognition. Our contributions are four-fold. First, we propose a unified framework for session variability modeling techniques based on Gaussian mixture models (GMM), that encompasses inter-session variability (ISV) modeling, joint factor analysis (JFA) and total variability (TV) modeling. Second, we propose a novel exact and scalable formulation of probabilistic linear discriminant analysis (PLDA), which is a probabilistic and generative framework that models between-class and within-class variations. This formulation solves a major scalability issue, by improving both the time complexity of the training procedure from cubic to linear with respect to the number of samples per class, and the complexity of the scoring procedure. Furthermore, the implementations of all the proposed techniques are integrated into a novel collaborative open source software library called Bob 1 that enforces fair evaluations and encourages reproducible research. Fourth and finally, large-scale experiments are conducted with all of the above machine learning algorithms on several databases such as FRGC for face recognition, NIST SRE12 for speaker recognition and MOBIO for bimodal recognition, showing competitive performances.

Name

EPFL_TH6175.pdf

Access type

restricted

Size

15.83 MB

Format

Adobe PDF

Checksum (MD5)

88a0718bc95f06f9f278fd3a70dc525b