Files

Abstract

This thesis describes the design and implementation of a framework that can track and identify multiple people in a crowded scene captured by multiple cameras. A people detector is initially employed to estimate the position of individuals. Those positions estimates are used by the face detector to prune the search space of possible face locations and minimize the false positives. A face classifier is employed to assign identities to the trajectories. Apart from recognizing the people in the scene, the face information is exploited by the tracker to minimize identity switches. Only sparse face recognitions are required to generate identity-preserving trajectories. Three face detectors are evaluated based on the project requirements. The face model of a person is described by Local Binary Pattern (histogram) features extracted from a number of patches of the face, captured by different cameras. The face model is shared between cameras meaning that one camera can recognize a face relying on patches captured by a different camera. Three classifiers are tested for the recognition task and an SVM is eventually employed. Due to the properties of the LBP, the recognition is robust to illumination changes and facial expressions. Also the SVM is trained from multiple views of the face of each person making the recognition also robust to pose changes. The system is integrated with two trackers, the state-of-the-art Multi-Commodity Network Flow tracker and a frame-by-frame Kalman tracker. We validate our method on two datasets generated for this purpose. The integration of face information with the people tracker demonstrates excellent performance and significantly improves the tracking results on crowded scenes, while providing the identities of the people in the scene.

Details

Actions

Preview