Skip to content

reference implementation for "explanations can be manipulated and geometry is to blame"

License

Notifications You must be signed in to change notification settings

pankessel/adv_explanation_ref

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explanations can be manipulated and Geometry is to blame

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated \emph{arbitrarily} by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

What we do

We manipulate images so their explanation resembles an arbitrary target map. Below you can see our algorithm in action:

In our paper we show how to achieve such manipulations. We discuss their nature and derive an upper bound on how much the explanation can change. Based on this bound we propose β-smoothing, a method that can be applied to any of the considered explanation methods to increase robustness against manipulations.

β-smoothing

We have demonstrated that one can drastically change the explanation map while keeping the output of the neural network constant. We argue that this vulnerability can be related to the large curvature of the output manifold of the neural network. We focus on the gradient method. The fact that the gradient can be drastically changed by slightly perturbing the input along the hypersurface suggests that the curvature of the hypersurface is large. If we replace the ReLU activations with softplus activations with parameter β, and reduce β we can reduce the curvature of the lines of equal network output. Below you can see the smoothing in action for a two layer neural network.

Links

NeurIPS paper

archiv version

google drive

Code

Install

Install dependencies using

 pip install -r requirements.txt 

Usage

Manipulate an image to reproduce a given target explanation using

python run_attack.py --cuda

For explanations beyond lrp you need to enable beta_growth so the second derivative of the activations is not zero.

python run_attack.py --cuda --method gradient --beta_growth

Plot softplus expanations for various values of beta using

python plot_expl.py --cuda 

To download patterns for pattern attribution, please use the following link:

https://drive.google.com/open?id=1RdvAiUZgfhSE8sVF2JOyURpnk1HQ_hZk

Copy the downloaded file in the models subdirectory.

License

This repository is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

About

reference implementation for "explanations can be manipulated and geometry is to blame"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published