This work addresses the problem of tracking, in real video sequences, the global motion of an infant face as well as the local motion of its inner features. This is a challenging task in Computer Vision field, because of the variability of facial appearance within a video sequence, most notably due to changes in head pose, expressions, lighting or occlusions. Thus, much research has been devoted to the problem of face tracking, as a specially difficult case of non-rigid object tracking. This task requires, by definition, the use of a model which describes the expected structure of the face. The advantage of explaining image data in terms of model parameters is to provide a natural basis for further interpretation, and it can be exploited by Human-Machine Interaction applications.