Files

Abstract

Deep Neural Networks (DNNs) have achieved great success in a wide range of applications, such as image recognition, object detection, and semantic segmentation. Even though the discriminative power of DNNs is nowadays unquestionable, serious concerns have arised ever since DNNs have shown to be vulnerable to adversarial examples crafted by adding imperceptible perturbations to clean images. The implications of these malicious attacks are even more significant for DNNs deployed in real-world systems, e.g., autonomous driving and biometric authentication. Consequently, an intriguing question that we aim to understand is the underlying behavior of DNNs to adversarial attacks. This thesis contributes to a better understanding of the mechanism of adversarial attacks on DNNs. Our main contributions are broadly in two directions: (1) we propose interpretable architectures first to understand the reasons for the success of adversarial attacks and then to improve the robustness of DNNs; (2) we design intuitive adversarial attacks to both mislead and use as a tool to expand our present understanding of DNNs' internal workings and their limitations. In the first direction, we introduce deep architectures that allow humans to interpret the reasoning process of DNNs prediction. Specifically, we incorporate Bag-of-visual-words representations from the pre-deep learning era into DNNs using an attention scheme. We find key reasons for adversarial attack success and use these insights to propose an adversarial defense by maximally separating the latent features of discriminative regions while minimizing the contribution of non-discriminative regions in the final prediction. The second direction deals with the design of adversarial attacks to understand DNNs' limitations in a real-world environment. To begin with, we show that existing state-of-the-art semantic segmentation networks that achieve superior performance by exploiting the context are highly susceptible to indirect local attacks. Furthermore, we demonstrate the existence of universal directional perturbations that are quasi-independent of the input template but still successfully fool unknown siamese-based visual object trackers. We then identify that the mid-level filter banks across different backbones bear strong similarities and thus can be potential common ground for attack. We, therefore, learn a generator that disrupts mid-level features with high transferability across different target architectures, datasets, and tasks. In short, our attacks highlight critical vulnerabilities of DNNs, which make their deployment challenging in the real-world environment, even in the extreme case when the attacker is unaware of the target architecture or the target data used to train it. Furthermore, we go beyond fooling networks and demonstrate the usefulness of adversarial attacks for studying the internal disentangled representations in self-supervised 3D pose estimation networks. We observe that adversarial manipulation of appearance information in the input image alters the pose output, indicating that the pose code contains appearance information and disentanglement is far from complete. Besides the above contributions, an underlying theme that arises multiple times in this thesis is counteracting the adversarial attacks by detecting them.

Details

Actions

Preview