Towards Recognizing Feature Points using Classification Trees

In earlier work \cite{Lepetit04b}, we proposed to treat wide baseline matching of feature points as a classification problem and proposed an implementation based on K-means and nearest neighbor classification. We showed that this method is both reliable and faster than competing methods, but still too slow for real-time implementations. Here we show that using decision trees instead speeds up the computation greatly, while increasing the robustness. This allows point matching under large viewpoint and illumination changes that is suitable for accurate object pose estimation at 25 Hz on a standard Pentium IV PC. Most of the previous methods rely either on using {\em ad hoc} local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of keypoints as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual keypoints and by using statistical classification tools to produce a compact description of this {\it view set}. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use decision trees to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. We will show that our method is both reliable and fast enough to detect and estimate in real-time the 3D pose of an object in the presence of occlusions, illumination changes, and cluttered backgrounds.

Related material