The influence of data on adversarial robustness
Deep learning models are generally vulnerable to small perturbations, raising concerns about safety, security, and performance guarantees. Achieving robustness has been shown to be costly in terms of computational resources and quantity of data. This thesis aims to study the data scalability challenges in existing approaches and to suggest a different angle: to improve models by better training data that, in turn, help the model to learn better features.
First, we emphasize the importance of training adversarially robust deep neural networks. We study mathematically well-defined tasks like automatic modulation classification (AMC), and show that solely pursuing high accuracy can lead to learning features that result from spurious correlations. We find in well-defined 2D tasks that the reason why standardly trained models learn spurious correlations is the high input dimensionality, and show that modifying the training data, particularly its labels, can solve this problem effectively.
Second, we propose a method to improve the training labels in the well-defined AMC task. We leverage the maximum likelihood (ML) framework to create better labels, and show that this method improves the robustness of the model and generalizes to unseen channel conditions. Then, to improve the labels also in tasks that are not well-defined, like image classification, we propose to use knowledge distillation (KD). We provide general guidelines to improve the adversarial performance of a student model trained on those labels. We use our insights to enhance state-of-the-art robust models and find that the increase in performance is dependent on the quantity and quality of data, and it is concentrated in the regions that the model has most difficulty learning.
Finally, we study the training data, and how the different regions of the input space can contribute to the final robustness of the model. In particular, we focus on the sample margin, which measures the minimum distance to the model classification boundary. Using the perceptron learning task, we find that adversarially training with the samples of highest margin is only useful when the dataset is large and there are no low-margin samples that cross the classification boundary when perturbed. Motivated by this insight, we propose PUMA, a novel strategy that uses DeepFool to estimate the margin in image classification, removes the high-margin samples and jointly adjusts the training attack norm on the samples that may potentially change class when perturbed. We show that it can enhance the performance of the current SOTA methodology in robustness, significantly improving the model accuracy while maintaining similar robustness.
In summary, our work shows that the lack of data scalability of existing approaches may be due to the fact that they are trying to fit the data, rather than the task. We show that robustness is essential to learn better features, and that it is possible to achieve better performance by modifying the training data. Thus, we propose new algorithms in different applications such as AMC and image classification.
EPFL_TH10677.pdf
Main Document
openaccess
N/A
5.35 MB
Adobe PDF
250dc3ac9e511e9048bd461cdfc46688