000174677 001__ 174677
000174677 005__ 20190619003313.0
000174677 0247_ $$2doi$$a10.5075/epfl-thesis-5310
000174677 02470 $$2urn$$aurn:nbn:ch:bel-epfl-thesis5310-4
000174677 02471 $$2nebis$$a6793223
000174677 037__ $$aTHESIS
000174677 041__ $$aeng
000174677 088__ $$a5310
000174677 245__ $$aLearning to Detect Objects with Minimal Supervision
000174677 269__ $$a2012
000174677 260__ $$bEPFL$$c2012$$aLausanne
000174677 300__ $$a145
000174677 336__ $$aTheses
000174677 520__ $$aMany classes of objects can now be successfully detected  with statistical machine learning techniques. Faces, cars and  pedestrians, have all been detected with low error rates by  learning their appearance in a highly generic manner from  extensive training sets. These recent advances have enabled  the use of reliable object detection components in real  systems, such as automatic face focusing functions on digital  cameras. One key drawback of these methods, and the issue  addressed here, is the prohibitive requirement that training  sets contain thousands of manually annotated examples. We  present three methods which make headway toward reducing  labeling requirements and in turn, toward a tractable  solution to the general detection problem. First, we propose a new learning strategy for object  detection. The proposed scheme forgoes the need to train a  collection of detectors dedicated to homogeneous families of  poses, and instead learns a single classifier that has the  inherent ability to deform based on the signal of interest.  We train a detector with a standard AdaBoost procedure by  using combinations of pose-indexed features and pose  estimators. This allows the learning process to select and  combine various estimates of the pose with features able to  compensate for variations in pose without the need to label  data for training or explore the pose space in testing. We  validate our framework on three types of data: hand video  sequences, aerial images of cars, as well as face images. We  compare our method to a standard Boosting framework, with  access to the same ground truth, and show a reduction in the  false alarm rate of up to an order of magnitude. Where  possible, we compare our method to the state-of-the art,  which requires pose annotations of the training data, and  demonstrate comparable performance. Second, we propose a new learning method which exploits  temporal consistency to successfully learn a complex  appearance model from a sparsely labeled training video. Our  approach consists in iteratively improving an  appearance-based model built with a Boosting procedure, and  the reconstruction of trajectories corresponding to the  motion of multiple targets. We demonstrate the efficiency of  our procedure by learning a pedestrian detector from videos  and a cell detector from microscopy image sequences. In both  cases, our method is demonstrated to reduce the labeling  requirement by one to two orders of magnitude. We show that  in some instances, our method trained with sparse labels on a  video sequence is able to outperform a standard learning  procedure trained with the fully labeled sequence. Third, we propose a new active learning procedure which  exploits the spatial structure of image data and queries  entire scenes or frames of a video rather than individual  examples. We extend the Query by Committee approach allowing  it to characterize the most informative scenes that are to be  selected for labeling. We show that an aggressive procedure  which exhibits zero tolerance to target localization error  performs as well as more sophisticated strategies taking into  account the trade-off between missed detections and  localization error. Finally, we combine this method with our  two proposed approaches above and demonstrate that the  resulting algorithm can properly perform car detection from a  small set of annotated image as well as pedestrian detection  from a handful of labeled video frames.
000174677 6531_ $$aimage processing
000174677 6531_ $$acomputer vision
000174677 6531_ $$aobject detection
000174677 6531_ $$astatistical machine learning
000174677 6531_ $$asemi-supervised learning
000174677 6531_ $$aactive learning
000174677 6531_ $$atraitement d'image
000174677 6531_ $$avision par ordinateur
000174677 6531_ $$adétection d'objets
000174677 6531_ $$aapprentissage statistique automatique
000174677 6531_ $$aapprentissage semi-dirigé
000174677 6531_ $$aapprentissage actif
000174677 700__ $$0242723$$g179297$$aAli, Karim
000174677 720_2 $$aFua, Pascal$$edir.$$g112366$$0240252
000174677 720_2 $$aFleuret, François$$edir.$$g146262$$0240254
000174677 8564_ $$uhttps://infoscience.epfl.ch/record/174677/files/EPFL_TH5310.pdf$$zTexte intégral / Full text$$s7150111$$yTexte intégral / Full text
000174677 909C0 $$xU10659$$0252087$$pCVLAB
000174677 909C0 $$xU10381$$0252189$$pLIDIAP
000174677 909CO $$pDOI$$ooai:infoscience.tind.io:174677$$qGLOBAL_SET$$pIC$$pthesis$$pSTI$$pthesis-bn2018$$pthesis-public$$qDOI2
000174677 918__ $$dEDIC2005-2015$$cISIM$$aIC
000174677 919__ $$aCVLAB
000174677 919__ $$aLIDIAP
000174677 920__ $$b2012
000174677 970__ $$a5310/THESES
000174677 973__ $$sPUBLISHED$$aEPFL
000174677 980__ $$aTHESIS