Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Sparse Attacks for Manipulating Explanations in Deep Neural Network Models
 
conference paper

Sparse Attacks for Manipulating Explanations in Deep Neural Network Models

Ajalloeian, Ahmad
•
Moosavi-Dezfooli, Seyed Mohsen
•
Vlachos, Michalis
Show more
Chen, G
•
Khan, L
Show more
January 1, 2023
23Rd Ieee International Conference On Data Mining, Icdm 2023
23rd IEEE International Conference on Data Mining (IEEE ICDM)

We investigate methods for manipulating classifier explanations while keeping the predictions unchanged. Our focus is on using a sparse attack, which seeks to alter only a minimal number of input features. We present an efficient and novel algorithm for computing sparse perturbations that alter the explanations but keep the predictions unaffected. We demonstrate that our algorithm, compared to PGD attacks with if constraint l(0), generates sparser perturbations while resulting in greater discrepancies between original and manipulated explanations. Moreover, we demonstrate that it is also possible to conceal the attribution of the k most significant features in the original explanation by perturbing fewer than k features of the input data. We present results for both image and tabular datasets, and emphasize the significance of sparse perturbation based attacks for trustworthy model building in high-stakes applications. Our research reveals important vulnerabilities in explanation methods that should be taken into account when developing reliable explanation methods. Code can be found at ht t ps://github.com/ahmadajal/sparse_expl_attacks

  • Details
  • Metrics
Type
conference paper
DOI
10.1109/ICDM58522.2023.00101
Web of Science ID

WOS:001165180100093

Author(s)
Ajalloeian, Ahmad
Moosavi-Dezfooli, Seyed Mohsen
Vlachos, Michalis
Frossard, Pascal  
Editors
Chen, G
•
Khan, L
•
Gao, X
•
Qiu, M
•
Pedrycz, W
•
Wu, X
Date Issued

2023-01-01

Publisher

Ieee Computer Soc

Publisher place

Los Alamitos

Published in
23Rd Ieee International Conference On Data Mining, Icdm 2023
ISBN of the book

979-8-3503-0788-7

Start page

918

End page

923

Subjects

Technology

•

Explainable Ai

•

Deep Neural Networks

•

Adversarial Attacks

•

Sparse Perturbation

•

Fairness

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LTS4  
Event nameEvent placeEvent date
23rd IEEE International Conference on Data Mining (IEEE ICDM)

Shanghai, PEOPLES R CHINA

DEC 01-04, 2023

Available on Infoscience
April 3, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/206788
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés