Information theoretic combination of classifiers with application to face detection

Combining several classifiers has become a very active subdiscipline in the field of pattern recognition. For years, pattern recognition community has focused on seeking optimal learning algorithms able to produce very accurate classifiers. However, empirical experience proved that is is often much easier finding several relatively good classifiers than only finding one single very accurate predictor. The advantages of combining classifiers instead of single classifier schemes are twofold: it helps reducing the computational requirements by using simpler models, and it can improve the classification skills. It is commonly admitted that classifiers need to be complementary in order to improve their performances by aggregation. This complementarity is usually termed as diversity in classifier combination community. Although diversity is a very intuitive concept, explicitly using diversity measures for creating classifier ensembles is not as successful as expected. In this thesis, we propose an information theoretic framework for combining classifiers. In particular, we prove by means of information theoretic tools that diversity between classifiers is not sufficient to guarantee optimal classifier combination. In fact, we show that diversity and accuracies of the individual classifiers are generally contradictory: two very accurate classifiers cannot be diverse, and inversely, two very diverse classifiers will necessarily have poor classification skills. In order to tackle this contradiction, we propose a information theoretic score (ITS) that fixes a trade-off between these two quantities. A first possible application is to consider this new score as a selection criterion for extracting a good ensemble in a predefined pool of classifiers. We also propose an ensemble creation technique based on AdaBoost, by taking into account the information theoretic score for iteratively selecting the classifiers. As an illustration of efficient classifier combination technique, we propose several algorithms for building ensembles of Support Vector Machines (SVM). Support Vector Machines are one of the most popular discriminative approaches of pattern recognition and are often considered as state-of-the-art in binary classification. However these classifiers present one severe drawback when facing a very large number of training examples: they become computationally expensive to train. This problem can be addressed by decomposing the learning into several classification tasks with lower computational requirements. We propose to train several parallel SVM on subsets of the complete training set. We develop several algorithms for designing efficient ensembles of SVM by taking into account our information theoretic score. The second part of this thesis concentrates on human face detection, which appears to be a very challenging binary pattern recognition task. In this work, we focus on two main aspects: feature extraction and how to apply classifier combination techniques to face detection systems. We introduce new geometrical filters called anisotropic Gaussian filters, that are very efficient to model face appearance. Finally we propose a parallel mixture of boosted classifier for reducing the false positive rate and decreasing the training time, while keeping the testing time unchanged. The complete face detection system is evaluated on several datasets, showing that it compares favorably to state-of-the-art techniques.

Related material