SparseFool: a few pixels make a big difference

Deep Neural Networks have achieved extraordinary results on image classification tasks, but have been shown to be vulnerable to attacks with carefully crafted perturbations of the input data. Although most attacks usually change values of many image's pixels, it has been shown that deep networks are also vulnerable to sparse alterations of the input. However, no \textit{efficient} method has been proposed to compute sparse perturbations. In this paper, we exploit the low mean curvature of the decision boundary, and propose SparseFool, a geometry inspired sparse attack that controls the sparsity of the perturbations. Extensive evaluations show that our approach outperforms related methods, and scales to high dimensional data. We further analyze the transferability and the visual effects of the perturbations, and show the existence of shared semantic information across the images and the networks. Finally, we show that adversarial training using $\ell_\infty$ perturbations can slightly improve the robustness against sparse additive perturbations.

Lien supplémentaire:

 Notice créée le 2018-11-08, modifiée le 2019-12-05

Évaluer ce document:

Rate this document:
(Pas encore évalué)