Aggregating Spatial and Photometric Context for Photometric Stereo
Photometric stereo, a computer vision technique for estimating the 3D shape of objects through images captured under varying illumination conditions, has been a topic of research for nearly four decades. In its general formulation, photometric stereo is an ill-posed problem and requires robust prior knowledge of material reflectance properties, light transport, and object shapes, all of which are quite difficult to obtain in many scenarios.
We focus on task of estimating the surface normals of an inspected object given a large, but apriori unknown, number of input images and the illumination directions under which these images were captured. This is also known as far-field dense calibrated photometric stereo, and it is the main topic of this thesis.
Like in many other computer vision fields, recent advances in photometric stereo have leveraged deep learning. Despite their success, these methods struggle with the large input data dimensionality, the disparity between the spatial domain and the domain of illumination directions, the apriori unknown number of observations provided for a scene, and the general unavailability of extensive real data collections to train them.
To tackle these issues, we formulate the problem as a four-dimensional regression and propose novel neural architectures that leverage both the spatial context of individual images and the photometric context captured in the intensity variations of individual pixels under different illumination directions. Our methods work with the concept of observation maps -- fixed-size two-dimensional planes, encoding pixel intensities together with the associated illumination directions for each pixel separately. This framework enabled the design of fully convolutional networks utilizing separable four-dimensional convolutions, which simultaneously process observation maps and image spatial dimensions, thus learning both reflectance and shape prior knowledge. With this approach, we achieve higher performance than the existing works.
Additionally, we introduce a fast rendering approach for on-the-fly sample generation during training, which allows for much larger diversity in shape and reflectance properties than existing static datasets offer. Coupled with an efficient training strategy, this approach enables training the four-dimensional neural architectures on standard consumer hardware within a reasonable timeframe. These innovations have culminated in state-of-the-art qualitative performance on all relevant benchmark datasets that feature real images, thus making a significant contribution to the field of photometric stereo.
EPFL_TH9806.pdf
n/a
openaccess
copyright
71.59 MB
Adobe PDF
2a4aa6c6df484f007443adb36daabf22