Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Learning stereo reconstruction with deep neural networks
 
doctoral thesis

Learning stereo reconstruction with deep neural networks

Tulyakov, Stepan  
2020

Stereo reconstruction is a problem of recovering a 3d structure of a scene from a pair of images of the scene, acquired from different viewpoints. It has been investigated for decades and many successful methods were developed.

The main drawback of these methods is that they typically utilize a single depth cue, such as parallax, defocus blur or shading, and thus are not as robust as a human visual system that simultaneously relies on a range of monocular and binocular cues. This is mainly because it is hard to manually design a model accounting for multiple depth cues. In this work, we address this problem by focusing on deep learning-based stereo methods that can discover a model for multiple depth cues directly from training data with ground truth depth.

The complexity of deep learning-based methods, however, requires very large training sets with ground truth depth, which is often hard or costly to collect. Furthermore, even when training data is available it is often contaminated with noise, which reduces the effectiveness of supervised learning. In this work, in Chapter 3 we show that it is possible to alleviate this problem by using weakly supervised learning, that utilizes geometric constraints of the problem instead of ground truth depth.

Besides the large training set requirement, deep stereo methods are not as application-friendly as traditional methods. They have a large memory footprint and their disparity range is fixed at training time. In this work, in Chapter 4 we address these problems by introducing a novel network architecture with a bottleneck, capable of processing large images and utilizing more context, and an estimator that makes the network less sensitive to stereo matching ambiguities and applicable to any disparity range without re-training.

Because deep learning-based methods discover depth cues directly from training data, they can be adapted to new data modalities without large modifications. In this work, in Chapter 5 we show that our method, developed for a conventional frame-based camera, can be used with a novel event-based camera, that has a higher dynamic range, smaller latency, and low power consumption. This camera instead of sampling intensity of all pixels with a fixed frequency, asynchronously reports events of significant pixel intensity changes. To adopt our method to this new data modality, we propose a novel event sequence embedding module, that firstly aggregates information locally, across time, using a novel fully-connected layer for an irregularly sampled continuous domain, and then across discrete spatial domain.

One interesting application of stereo is a reconstruction of a planet's surface topography from satellite stereo images. In this work, in Chapter 6 we describe a geometric calibration method, as well as mosaicing and stereo reconstruction tools that we developed in the framework of the doctoral project for Color and Stereo Surface Imaging System onboard of ESA's Trace Gas Orbiter, orbiting Mars. For the calibration, we propose a novel method, relying on starfield images because large focal lengths and complex optical distortion of the instrument forbid using standard methods. Scientific and practical results of this work are widely used by a scientific community.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-7086
Author(s)
Tulyakov, Stepan  
Advisors
Fleuret, François  
•
Ivanov, Anton  
Jury

Prof. Pascal Frossard (président) ; Prof. François Fleuret, Dr Anton Ivanov (directeurs) ; Prof. Alexandre Alahi, Prof. Raphael Sznitman, Dr Jan Dirk Wegner (rapporteurs)

Date Issued

2020

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2020-02-28

Thesis number

7086

Total of pages

121

Subjects

stereo

•

deep learning

•

weakly supervised

•

efficient

•

sub-pixel cross-entropy

•

sub-pixel MAP

•

event-based camera

•

geometric calibration

•

CaSSIS

EPFL units
LIDIAP  
Faculty
STI  
School
IEL  
Doctoral School
EDEE  
Available on Infoscience
February 17, 2020
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/166273
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés