Spatio-temporal human pose detection

This thesis proposes a Chamfer-based method for human body pose detection that combines silhouette matching, motion information, and statistical relevance estimates in an original way. We demonstrate that our method can not only detect people but also recover their full 3D pose when they are seen from different viewpoints and at different scales, when the background is cluttered and background subtraction is impractical because camera moves. We introduce spatio-temporal templates that consist of short sequences of 2D silhouettes obtained from motion capture data. This way, the motion information is inherent in the templates, which is important because human motion is very different from other kinds of motions and can be effectively used to distinguish humans from both static and moving background objects. The templates can handle different camera views, as well as different scales. They are matched against short image sequences. During a training phase, we use statistical learning techniques to estimate and store the relevance of the different silhouette parts to the recognition task. At run-time, we use it to convert Chamfer distances into meaningful probability estimates. For example, for walking motions, this accounts for the fact that feet and shoulders provide much more discriminant information than the trunk. Using the probability estimates makes the recognition algorithm much more discriminating. To demonstrate our approach we chose two types of motion: walking and golf swings. All the walking templates represent the specific part on the walking cycle where the feet are on the ground and the angle between the legs is greatest. The characteristic pose that we have chosen for a golf swing is the beginning of the downswing, when the arms are at the highest position. To further improve the performance of our algorithm we use dynamic programming to link the various detections of walking people to create plausible trajectories. We filter out the detections which are not lying on the recovered trajectory. This way, all detections whose orientation is wrong are eliminated as well as false positives on the background, if any. Finally, we show that the reliable specific 3D poses provided by our approach, allow us to treat the 3D tracking of human motion as an interpolation problem, which unlike traditional tracking approaches is both robust and fully automated.

Related material