Understanding generalization and robustness in modern deep learning

Andriushchenko, Maksym

doi:10.5075/epfl-thesis-10541

Andriushchenko, Maksym

2024

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversarial examples that can easily fool state-of-the-art deep neural networks into making wrong predictions. Their existence can be seen as a generalization problem: despite the impressive average-case performance, the deep learning models tend to learn non-robust features that can be used for adversarial manipulations. In this thesis, we delve deeply into a range of questions related to robustness and generalization, such as how to accurately evaluate robustness, how to make robust training more efficient, and why some optimization algorithms lead to better generalization and learn qualitatively different features. We start the first direction from exploring computationally efficient methods to perform adversarial training and its failure mode referred to as catastrophic overfitting when the model suddenly loses its robustness after some point in training. Then we provide a better understanding of the robustness evaluation and the progress in the field by proposing new query-efficient black-box adversarial attacks based on random search that do not rely on the gradient information and thus can complement a typical robustness evaluation based on gradient-based methods. Finally, for the same goal, we propose a new community-driven robustness benchmark RobustBench which aims to systematically track the progress in the field in a standardized way. We start the second direction from investigating reasons behind the success of sharpness-aware minimization, a recent algorithm that increases robustness in the parameter space during training and improves generalization for deep networks. Then we discuss why overparameterized models trained with stochastic gradient descent tend to generalize surprisingly well even without any explicit regularization. We study the implicit regularization induced by stochastic gradient descent with large step sizes and its effect on the features learned by the model. Finally, we rigorously study the relationship between sharpness of minima (i.e., robustness in the parameter space) and generalization that prior works observed to correlate to each other. Our study suggests that, contrary to the common belief, sharpness is not a good indicator of generalization and it rather tends to correlate well with some hyperparameters like the learning rate but not inherently with generalization.

Details

Title Understanding generalization and robustness in modern deep learning

Author(s) Andriushchenko, Maksym

Advisor(s)

Flammarion, Nicolas Henri Bernard

Pagination 299

Date 2024

Publisher Lausanne, EPFL

Keywords

Machine learning; deep learning; adversarial robustness; generalization; implicit regularization.

Language English

DOI https://doi.org/10.5075/epfl-thesis-10541

Laboratories TML

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > TML - Theory of Machine Learning Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2024-05-13

Files

Abstract

Details

PDF