Bridging the Gap between Detection and Tracking for 3D Monocular Video-Based Motion Capture
We combine detection and tracking techniques to achieve robust 3-D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3-D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the case of people walking against cluttered backgrounds and filmed using a moving camera, which precludes the use of simple background subtraction techniques. In this case, the easy-to-detect posture is the one that occurs at the end of each step when people have their legs furthest apart.