Vision-Based Scene Understanding with Sparsity Promoting Priors
Human beings are interested in understanding their environments and the dynamic content that fills their surroundings. For applications ranging from security to marketing, people have installed networks of cameras to capture the dynamic elements of scenes. In this thesis, we propose a complete real-time system to automatically analyze human behavior from any network of cameras. The proposed system leverages mixed networks of fixed and mobile cameras to locate people, track them, and analyze their trajectories. The mathematical frameworks underlying our proposed methods are based on the following claim: The dynamics of a scene are based on a small set of causes, and therefore can be parameterized by a few degrees of freedom. Every processing block of our system is driven by sparsity promoting priors, i.e., just a few elements are sufficient to capture the scene dynamics. We first present our multi-view people localization algorithm that is designed for a network of fixed cameras. An inverse problem with a sparsity constraint is formulated to detect people using the degraded foreground silhouettes extracted by the cameras. To solve this sparsity driven formulation in a manner appropriate for a real-time implementation, we then propose an approach called "Set Covering Occupancy Object Pursuit" (SCOOP) that outperforms the state-of-the-art. Next, we tackle the data association problem of finding correspondences between located people across time. We implement a graph-based greedy approach to reach real-time tracking performance. Unlike the fixed camera networks considered in the first part of this thesis, mobile cameras are uncalibrated and often monitor non-overlapping fields-of-views with other cameras. We propose a "Cascade of Grids of Image Descriptors" (CaGID) with a sparse search to accurately detect and track objects across uncalibrated cameras with non-overlapping fields-of-views. We evaluate the ability of such mixed networks of cameras to alert drivers to a potential collision with pedestrians. For this application, a camera mounted in a vehicle collaborates with a network of fixed cameras installed in a city. Finally, the proposed system is evaluated for coaching and marketing purposes. The behavior of people in sports games and stores is analyzed in real-time with a graph-based algorithm coined "SpotRank". A probability map inspired by the PageRank algorithm is proposed to rank the most salient 'hot spots' based upon mutual flows. Several public data sets have been used to quantitatively and qualitatively evaluate the performance of our system. To our knowledge, it is the first system to capture the behavior of people in crowded environments and analyze this behavior in real-time with sparsity priors.
Keywords: people detection ; object tracking ; object matching ; people behavior analysis ; sparsity ; convex optimization ; multi-view ; mobile camera ; fixed camera ; non-overlapping ; spotrank ; scoop ; cagid ; cascade ; image descriptor ; master-slave ; foreground extraction ; dictionary ; détection de piétons ; suivi d'objet ; identification d'objet ; analyse de comportement ; parcimonie ; optimisation convexe ; réseau de caméras ; caméra mobile ; caméra fixe ; spotrank ; scoop ; descripteur d'images ; dictionnaireThèse École polytechnique fédérale de Lausanne EPFL, n° 5070 (2011)
Programme doctoral Informatique, Communications et Information
Faculté des sciences et techniques de l'ingénieur
Institut de génie électrique et électronique
Laboratoire de traitement des signaux 2
Record created on 2011-04-27, modified on 2016-12-12