This paper addresses the problem of efficient representation and compression of scenes captured by distributed vision sensors. We propose a novel geometrical model to describe the correlation between different views of a three-dimensional scene. We first approximate the camera images by sparse expansion over a dictionary of geometric atoms, as the most important visual features are likely to be equivalently dominant in images from multiple cameras. The correlation model is then built on local geometrical transformations between corresponding features taken in different views, where correspondences are defined based on shape and epipolar geometry constraints. Based on this geometrical framework, we design a distributed coding scheme with side information, which builds an efficient representation of the scene without communication between cameras. The Wyner-Ziv encoder partitions the dictionary into cosets of dissimilar atoms with respect to shape and position in the image. The joint decoder then determines pairwise correspondences between atoms in the reference image and atoms in the cosets of the Wyner-Ziv image. It selects the most likely correspondence among pairs of atoms that satisfy epipolar geometry constraints. Atom pairing permits to estimate the local transformations between correlated images, which are later used to refine the side information provided by the reference image. Experiments demonstrate that the proposed method leads to reliable estimation of the geometric transformations between views. The distributed coding scheme offers similar rate-distortion performance as joint encoding at low bit rate and outperforms methods based on independent decoding of the different images.