Deep convolutional neural networks (DCNNs) have significantly advanced the field of computer vision and are often considered as plausible models for the human visual system. However, they still have several vulnerabilities in terms of adversarial attacks, common corruptions of input samples, background bias, etc. This thesis delves into the intriguing potential of leveraging human vision as a model for enhancing the robustness and adaptability of deep convolutional neural networks (DCNNs). The investigation commences with a review of human-vision-inspired DCNN models. We discover that while these models offer an improvement in robustness over traditional DCNNs, their performance enhancement is sporadic and falls short of state of the art data augmentation techniques. This underlines the need to shift from directly incorporating elements of human vision into DCNNs to exploring different ways to make models robust. Taking inspiration from the adaptability of human vision, this thesis suggests new ways to make DCNN models more robust by mimicking this adaptability.
We first propose an adaptive evaluation framework MUFIA, which aims to identify and generate a wide range of real-world image corruptions not accounted for by current benchmark datasets. MUFIA manipulates frequency components of an image to generate an adversarial corruption. These corruptions pinpoint specific vulnerabilities of DCNN models while preserving the semantic integrity of images. Our findings reveal the inadequacies of current data augmentation strategies in addressing the realistic corruptions produced by MUFIA. This further necessitates the need for more robust defense mechanisms against a broader array of corruptions.
We then create an adaptive pre-processing scheme, EREN, that is inspired by the human visual system's innate ability to adapt inputs for enhanced perception. Unlike traditional image processing methods developed with human perceptual biases in mind, EREN is tailored to the unique biases of DCNNs, proposing a paradigm where input adjustments are made to align with model-specific biases. This approach does not only improve model performance against corruptions but also signifies a shift towards leveraging the nuanced preferences of DCNNs for robust optimization.
Lastly, we explore an alternative avenue for bolstering DCNN robustness through CLAD, an innovative data augmentation strategy inspired by human vision. CLAD recalibrates model biases to prioritize foreground elements and shapes over background details, mirroring the human visual system's focus on shape and context. This method diminishes the influence of background bias on model accuracy, illustrating the benefits of integrating human-vision-inspired biases into machine learning models for increased robustness and generalization.
Overall, this thesis highlights the complexities of incorporating insights from human vision into DCNNs. Directly embedding human visual mechanisms within DCNNs has yet to yield consistent improvements. However, our research showcases the promise of leveraging human-vision-inspired biases and adaptations for enhancing DCNN robustness against various input changes.
EPFL_TH10367.pdf
Main Document
openaccess
N/A
16.92 MB
Adobe PDF
69c1ad0ab602c0c984a080dfbfb352ba