A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images

We present a method that estimates in real-time and under challenging conditions the 3D pose of a known object. Our method relies only on grayscale images since depth cameras fail on metallic objects; it can handle poorly textured objects, and cluttered, changing environments; the pose it predicts degrades gracefully in presence of large occlusions. As a result, by contrast with the state-of-the-art, our method is suitable for practical Augmented Reality applications even in industrial environments. To be robust to occlusions, we first learn to detect some parts of the target object. Our key idea is to then predict the 3D pose of each part in the form of the 2D projections of a few control points. The advantages of this representation is three-fold: We can predict the 3D pose of the object even when only one part is visible; when several parts are visible, we can combine them easily to compute a better pose of the object; the 3D pose we obtain is usually very accurate, even when only few parts are visible.

Presented at:
International Conference on Computer Vision (ICCV), Santiago, Chile, December 13-16, 2015

 Record created 2015-10-02, last modified 2018-03-17

Supplementary material:
Download fulltextPDF
Download fulltextPDF
Rate this document:

Rate this document:
(Not yet reviewed)