DAISY: A Fast Descriptor for Dense Wide Baseline Stereo and Multiview Reconstruction

Tola, Engin

doi:10.5075/epfl-thesis-4830

doctoral thesis

DAISY: A Fast Descriptor for Dense Wide Baseline Stereo and Multiview Reconstruction

2010

Stereo reconstruction is a fundamental problem of computer vision. It has been studied for more than three decades and significant progress has been made in recent years as evidenced by the quality of the models now being produced. This is highly related with the advances in other fields. With the emergence of low cost high-quality cameras, we now live in an era where there is an abundant amount of data for use in reconstruction. The multitude of images with numerous sources of capture arose new interest in the stereo vision community due to new challenges such as being robust to photometric and geometric variability, scalability issues related to number of images and image resolutions. In this thesis, we aim to find efficient, and therefore practical, algorithmic solutions for the two extreme ends of stereo vision problem: first, we consider only two input image case where the cameras are placed far from each other and then we investigate the large scale multi-view reconstruction for ultra-high resolution image sets. Both problems have unique challenges where in the first part we need to handle the large perspective distortions that the image texture undergoes and in the second part we need to design an algorithm that can scale up to ultra-high resolution very large number of image sets using only a single standard computer. For the first problem, we design an efficient dense image descriptor, called DAISY, that is not only robust to photometric transforms like brightness and contrast changes but also robust to perspective effects that view-point changes produce. We use the DAISY descriptor as a photo-consistency measure in an expectation maximization framework with a global graph-cuts optimization algorithm to estimate depth and occlusion maps. We demonstrate very successful results on a variety of data sets some of which have laser scanned ground truths. After the estimation of depth and occlusion maps, we introduce a technique to improve the surface reconstruction in occluded areas by extracting normal cues using simple binary classifiers trained over DAISY-like features. For the large scale ultra-high resolution multi-view stereo problem, we design a very efficient local optimization algorithm instead of the global one developed in the first part of the thesis for the depth estimation framework. The scalability over the number of images is handled by representing the scene with a set of depth maps and the scalability over the image resolution is handled by the use of a local approach for depth map estimation. We demonstrate state-of-the-art quality results for very large sets of very high resolution images computed on a single standard computer at comparatively very short computation times. Overall, we show that the use of a distinctive and robust descriptor to measure photo-consistency allows us to avoid many complex stages other algorithms utilize without sacrificing from the accuracy of the results and thus scale up to large data sets easily.

Name

EPFL_TH4830.pdf

Access type

restricted

Size

140.65 MB

Format

Adobe PDF

Checksum (MD5)

9fedd77f4a346d2d87e2bff27946f24d