Fast Human Detection in Videos using Joint Appearance and Foreground Learning from Covariances of Image Feature Subsets

We present a fast method to detect humans from stationary surveillance videos. Traditional approaches exploit background subtraction as an attentive filter, by applying the still image detectors only on foreground regions. This doesn't take into account that foreground observations contain human shape information which can be used for detection. To address this issue, we propose a method that learn the correlation between appearance and foreground information. It is based on a cascade of LogitBoost classifiers which uses covariance matrices computed from appearance and foreground features as object descriptors. We account for the fact that covariance matrices lie in a Riemanian space, introduce different novelties -like exploiting only covariance sub-matrices- to reduce the induced computation load, as well as an image rectification scheme to remove the slant of people in images when dealing with wide angle cameras. Evaluation on a large set of videos shows that our approach performs better than the attentive filter paradigm while processing from 5 to 20 frames/sec. In addition, on the INRIA human (static image) benchmark database, our sub-matrix approach performs better than the full covariance case while reducing the computation cost by more than one order of magnitude.

Related material