Advances in top-down and bottom-up approaches to video-based camera tracking

Marimón Sanjuán, David

doi:10.5075/epfl-thesis-3970

doctoral thesis

Advances in top-down and bottom-up approaches to video-based camera tracking

2007

Video-based camera tracking consists in trailing the three dimensional pose followed by a mobile camera using video as sole input. In order to estimate the pose of a camera with respect to a real scene, one or more three dimensional references are needed. Examples of such references are landmarks with known geometric shape, or objects for which a model is generated beforehand. By comparing what is seen by a camera with what is geometrically known from reality, it is possible to recover the pose of the camera that is sensing these references. In this thesis, we investigate the problem of camera tracking at two levels. Firstly, we work at the low level of feature point recognition. Feature points are used as references for tracking and we propose a method to robustly recognise them. More specifically, we introduce a rotation-discriminative region descriptor and an efficient rotation-discriminative method to match feature point descriptors. The descriptor is based on orientation gradient histograms and template intensity information. Secondly, we have worked at the higher level of camera tracking and propose a fusion of top-down (TDA) and bottom-up approaches (BUA). We combine marker-based tracking using a BUA and feature points recognised from a TDA into a particle filter. Feature points are recognised with the method described before. We take advantage of the identification of the rotation of points for tracking purposes. The goal of the fusion is to take advantage of their compensated strengths. In particular, we are interested in covering the main capabilities that a camera tracker should provide. These capabilities are automatic initialisation, automatic recovery after loss of track, and tracking beyond references known a priori. Experiments have been performed at the two levels of investigation. Firstly, tests have been conducted to evaluate the performance of the recognition method proposed. The assessment consists in a set of patches extracted from eight textured images. The images are rotated and matching is done for each patch. The results show that the method is capable of matching accurately despite the rotations. A comparison with similar techniques in the state of the art depicts the equal or even higher precision of our method with much lower computational cost. Secondly, experimental assessment of the tracking system is also conducted. The evaluation consists in four sequences with specific problematic situations namely, occlusions of the marker, illumination changes, and erratic and/or fast motion. Results show that the fusion tracker solves characteristic failure modes of the two combined approaches. A comparison with similar trackers shows competitive accuracy. In addition, the three capabilities stated earlier are fulfilled in our tracker, whereas the state of the art reveals that no other published tracker covers these three capabilities simultaneously. The camera tracking system has a potential application in the robotics domain. It has been successfully used as a man-machine interface and applied in Augmented Reality environments. In particular, the system has been used by students of the University of art and design Lausanne (ECAL) with the purpose of conceiving new interaction concepts. Moreover, in collaboration with ECAL and fabric | ch (studio for architecture & research), we have jointly developed the Augmented interactive Reality Toolkit (AiRToolkit). The system has also proved to be reliable in public events and is the basis of a game-oriented demonstrator installed in the Swiss National Museum of Audiovisual and Multimedia (Audiorama) in Montreux.

Name

EPFL_TH3970.pdf

Access type

openaccess

Size

6.59 MB

Format

Adobe PDF

Checksum (MD5)

c2c8ccd0cf38951d17d806c76675db7a