Abstract

The extraction and matching of interest points is a prerequisite for many geomet-ric computer vision problems. Traditionally, matching has been achieved by assigningdescriptors to interest points and matching points that have similar descriptors. In this pa-per, we propose a method by which interest points are instead already implicitly matchedat detection time. With this, descriptors do not need to be calculated, stored, commu-nicated, or matched any more. This is achieved by a convolutional neural network withmultiple output channels and can be thought of as a collection of a variety of detec-tors, each specialised to specific visual features. This paper describes how to design andtrain such a network in a way that results in successful relative pose estimation perfor-mance despite the limitation on interest point count. While the overall matching score isslightly lower than with traditional methods, the approach is descriptor free and thus en-ables localization systems with a significantly smaller memory footprint and multi-agentlocalization systems with lower bandwidth requirements. The network also outputs theconfidence for a specific interest point resulting in a valid match. We evaluate perfor-mance relative to state-of-the-art alternatives.

Details

Actions