Conference paper

Learning to recognise talking faces

An approach for person identification is described based on spatio-temporal analysis of the talking face. A person is represented by a parametric model of the visible speech articulators and their temporal characteristics during speech production. The model consists of shape parameters, representing the lip contour and intensity parameters representing the grey level distribution in the mouth region. The model is used to track lips in image sequences where the model parameters are recovered from the tracking results. While some of these parameters relate to speech information, others are intuitively related to different persons and we show that models based on these features enable successful person identification. We model the shape and intensity parameters as mixtures of Gaussians and their temporal dependencies by Hidden Markov Models. Identifying a talking person is performed by estimating the likelihood of each model for having generated the observed sequence of features and the model with the highest likelihood is chosen as the identified person.

Related material