3D Geometry Representation using Multiview Coding of Image Tiles
Compression of dynamic 3D geometry obtained from depth sensors is challenging, because noise and temporal inconsistency inherent in acquisition of depth data means there is no one-to-one correspondence between sets of 3D points in consecutive time instants. In this paper, instead of coding 3D points (or meshes) directly, we propose to represent an object’s 3D geometry as a collection of tile images. Specifically, we first place a set of image tiles around an object. Then, we project the object’s 3D geometry onto the tiles that are interpreted as 2D depth images, which we subsequently encode using a modified multiview image codec tuned for piecewise smooth signals. The crux of the tile image framework is the “optimal” placement of image tiles—one that yields the best tradeoff in rate and distortion. We show that if only planar and cylindrical tiles are considered, then the optimal placement problem for K tiles can be mapped to a tractable piecewise linear approximation problem. We propose an efficient dynamic programming algorithm to find an optimal solution to the piecewise linear approximation problem. Experimental results show that optimal tiling outperforms naive tiling by up to 35% in rate reduction, and graph transform can further exploit the smoothness of the tile images for coding gain.