Bridging the Gap between Detection and Tracking for 3D Human Motion Recovery

The aim of this thesis is to build a system able to automatically and robustly track human motion in 3–D starting from monocular input. To this end two approaches are introduced, which tackle two different types of motion: The first is useful to analyze activities for which a characteristic pose, or key-pose, can be detected, as for example in the walking case. On the other hand the second can be used for cases in which such pose is not defined but there is a clear relation between some easily measurable image quantities and the body configuration, as for example in the skating case where the trajectory followed by a subject is highly correlated to how the subject articulates. In the first proposed technique we combine detection and tracking techniques to achieve robust 3D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the cases of golf motions filmed using a static camera and walking motions acquired using a potentially moving one. We will show that this approach, although monocular, is both metrically accurate because it integrates information over many frames and robust because it can recover from a few misdetections. The second approach is based on the fact that the articulated body models used to represent human motion typically have many degrees of freedom, usually expressed as joint angles that are highly correlated. The true range of motion can therefore be represented by latent variables that span a low-dimensional space. This has often been used to make motion tracking easier. However, learning the latent space in a problem independent way makes it non trivial to initialize the tracking process by picking appropriate initial values for the latent variables, and thus for the pose. In this thesis, it will be shown that by directly using observable quantities as latent variables, this issue can be eliminated.

Related material