Files

Abstract

In this paper, we propose to study the problem of localization of a dense set of people with a network of heterogeneous cameras. We propose to recast the problem as a linear inverse problem. The proposed framework is generic to any scene, scalable in the number of cameras and versatile with respect to their geometry, e.g. planar or omnidirectional. It relies on deducing an \emph {occupancy vector}, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few non- zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. This constitutes a linearization of the problem, where non- linearities, such as occlusions, are treated as additional noise on the observed silhouettes. Mathematically, we express the final inverse problem either as Basis Pursuit DeNoise or Lasso convex optimization programs. The sparsity measure is reinforced by iteratively re-weighting the $\ell_1$-norm of the occupancy vector for better approximating its $\ell_0$ ``norm'', and a new kind of ``repulsive'' sparsity is used to adapt further the Lasso procedure to the occupancy reconstruction. Practically, an adaptive sampling process is proposed to reduce the computation cost and monitor a large occupancy area. Qualitative and quantitative results are presented on a basketball game. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques.

Details

Actions

Preview