In this paper, we show that in a multi-camera context, we can effectively handle occlusions at each time frame independently, even when the only available data comes from the binary output of a fairly primitive motion detector. We start from occupancy probability estimates in a top view and rely on a generative model to yield probability images to be compared with the actual input images. We then refine the estimates so that the probability images match the binary input images as well as possible. We demonstrate the quality of our results on several sequences involving complex occlusions.