Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Gradient-based Methods for Deep Model Interpretability
 
doctoral thesis

Gradient-based Methods for Deep Model Interpretability

Srinivas, Suraj  
2021

In this dissertation, we propose gradient-based methods for characterizing model behaviour for the purposes of knowledge transfer and post-hoc model interpretation. Broadly, gradients capture the variation of some output feature of the model upon unit variation of an input feature, and thus encodes the local model behaviour while being agnostic to the underlying model architectural choices.

Our first contribution is to propose a sample-efficient method to mimic the behaviour of a pre-trained teacher model with an untrained student model using gradient information. We interpret our approach as an efficient alternative to data augmentation used with canonical knowledge transfer approaches, where noise is added to the inputs. We apply this to distillation and a transfer learning task, where we show improved performance for small datasets.

Our second contribution is to propose a novel saliency method to visualize the input features that are most relevant for predictions made by a given model. We first propose the full-gradient representation, which satisfies a property called completeness which provably cannot be satisfied by gradient-based saliency methods. Based on this, we propose an approximate saliency map representation called FullGrad which naturally captures the information within a model across feature hierarchies. Our experimental results show that FullGrad captures model behaviour better than other saliency methods.

Our final contribution is to take a step back and ask why input-gradients are informative for standard neural network models in the first place, especially when their structure may as well be arbitrary. Our analysis here reveals that for a subset of gradient-based saliency maps, the map relies not on the underlying discriminative model p(y | x) but on a hidden density model p(x | y) implicit within softmax-based disciminative models. Thus we find that the reason input-gradients are informative is due to the alignment of the implicit density model with that of the ground truth density, which we verify experimentally.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-8606
Author(s)
Srinivas, Suraj  
Advisors
Fleuret, François  
•
Frossard, Pascal  
Jury

professeure Anja Skrivervik Favre (présidente) ; Prof. François Fleuret, Prof. Pascal Frossard (directeurs) ; Prof. Alexandre Alahi, Dr. Been Kim, Dr. Ludovic Denoyer (rapporteurs)

Date Issued

2021

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2021-11-11

Thesis number

8606

Total of pages

133

Subjects

Deep neural networks

•

knowledge transfer

•

distillation

•

saliency maps

•

interpretability

EPFL units
LIDIAP  
Faculty
STI  
School
IEM  
Doctoral School
EDEE  
Available on Infoscience
October 28, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/182593
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés