Representing and reconstructing 3D deformable shapes are two tightly linked problems that have long been studied within the computer vision field. Deformable shapes are truly ubiquitous in the real world, whether be it specific object classes such as humans, garments and animals or more abstract ones such as generic materials deforming under an external force. Practical computer vision algorithms must be able to understand the shapes of objects in the observed scenes to unlock the wide spectrum of much sought after applications ranging from virtual try-on to automated surgeries.
Automatic shape reconstruction is known to be an ill-posed problem, especially in the common scenario of a single image input. Therefore, the modern approaches rely on deep learning paradigm which has proven to be extremely effective even for the severely under-constrained computer vision problems. We, too, exploit the success of data-driven approaches, however, we also show that generic deep learning models can greatly benefit from being combined with explicit knowledge originating in computational geometry. We analyze the use of various 3D shape representations and we distinctly focus on one of them, the atlas-based representation, which turns out to be especially suitable for modeling deformable shapes and which we further improve and extend.
The atlas-based representation models the surfaces as an ensemble of continuous functions and thus allows for arbitrary resolution and analytical surface analysis. We identify its major shortcomings, namely the patch collapse, patch overlap and strong mapping distortions, and we propose novel regularizers based on analytically computed properties of the reconstructed surfaces. Our approach counteracts the aforementioned drawbacks while yielding higher reconstruction accuracy.
We dive into the problematics of atlas-based shape representation deeper and focus on another design flaw, the global inconsistency of the mappings. While it is not reflected in quantitative metrics, it is detrimental to the visual quality of the reconstructed surfaces. Specifically, we design loss functions encouraging intercommunication among the mappings which pushes the resulting surface towards a C1 smooth function and thus dramatically improves the visual quality.
Furthermore, we adapt the atlas-based representation so that it could model a full sequence of a deforming object in a temporally-consistent way. The goal is to produce such reconstruction where each surface point always represents the same semantic point on the target GT surface. To achieve such behavior, we note that if each surface point deforms close-to-isometrically, its semantic location likely remains unchanged. Practically, we make use of the Riemannian metric, and force it to remain point-wise constant throughout the sequence. The experiments show that our method yields SotA results on correspondence estimation task.
Finally, we look into a particular problem of monocular texture-less deformable shape reconstruction. We propose a multi-task learning approach which jointly produces a normal map, a depth map and a mesh corresponding to the observed surface. We show that producing multiple different 3D representations of the same objects results in higher reconstruction quality. We acquire a large real-world annotated dataset of texture-less deforming objects and we release it for public use.
EPFL_TH7974.pdf
n/a
openaccess
Copyright
80.25 MB
Adobe PDF
f1afe854d8ae5b881ce692648822fec6