Files

Abstract

Deep neural networks have achieved impressive results in many image classification tasks. However, since their performance is usually measured in controlled settings, it is important to ensure that their decisions remain correct when deployed in noisy environments. In fact, deep networks are not robust to a large variety of semantic-preserving image modifications, even to imperceptible image changes -- known as adversarial perturbations -- that can arbitrarily flip the prediction of a classifier. The poor robustness of image classifiers to small data distribution shifts raises serious concerns regarding their trustworthiness. To build reliable machine learning models, we must design principled methods to analyze and understand the mechanisms that shape robustness and invariance. This is exactly the focus of this thesis. First, we study the problem of computing sparse adversarial perturbations, and exploit the geometry of the decision boundaries of image classifiers for computing sparse perturbations very fast. We evaluate the robustness of deep networks to sparse adversarial perturbations in high-dimensional datasets, and reveal a qualitative correlation between the location of the perturbed pixels and the semantic features of the images. Such correlation suggests a deep connection between adversarial examples and the data features that image classifiers learn. To better understand this connection, we provide a geometric framework that connects the distance of data samples to the decision boundary, with the features existing in the data. We show that deep classifiers have a strong inductive bias towards invariance to non-discriminative features, and that adversarial training exploits this property to confer robustness. We demonstrate that the invariances of robust classifiers are useful in data-scarce domains, while the improved understanding of the data influence on the inductive bias of deep networks can be exploited to design more robust classifiers. Finally, we focus on the challenging problem of generalization to unforeseen corruptions of the data, and we propose a novel data augmentation scheme that relies on simple families of max-entropy image transformations to confer robustness to common corruptions. We analyze our method and demonstrate the importance of the mixing strategy on synthesizing corrupted images, and we reveal the robustness-accuracy trade-offs arising in the context of common corruptions. The controllable nature of our method permits to easily adapt it to other tasks and achieve robustness to distribution shifts in data-scarce applications. Overall, our results contribute to the understanding of the fundamental mechanisms of deep image classifiers, and pave the way for building more reliable machine learning systems that can be deployed in real-world environments.

Details

PDF