This repository contains implementations of various optimization methods for generating adversarial examples against convolutional neural networks. It accompanies the research paper "Adversarial Attacks on Neural Networks: A Survey."
Below are some example visualizations from the paper, illustrating the effects of different adversarial attacks.
Figure 1: Untargeted vs. Targeted Attack Example (from paper Figure 1) (Illustrates untargeted vs. targeted adversarial attack outcomes on ResNet-18. Original image correctly classified as "saluki". Untargeted attack results in "beagle". Targeted attack forces classification to "gorilla".)
Figure 2: Qualitative Comparison of Untargeted Attacks (from paper Figure 2) (Qualitative comparison of untargeted attacks on ResNet-18 generated by different methods for several input images. Original images are correctly classified. All attack methods induce misclassification to various incorrect labels.)
Figure 3: Grad-CAM Visualization of Model Attention (from paper Figure 6.1 / Figure 3 in PDF text) (Comparison of adversarial attacks and their effect on model attention using Grad-CAM. Illustrates how iterative methods successfully redirect model's attention for targeted attacks, while single-iteration methods struggle.)
Adversarial examples are carefully crafted perturbations that, when added to an input image, cause neural networks to misclassify the image while appearing visually imperceptible to humans. This project and the accompanying paper provide a systematic survey and empirical comparison of six foundational adversarial attack strategies:
- Fast Gradient Sign Method (FGSM)
- Fast FGSM (FFGSM)
- DeepFool
- Carlini & Wagner (C&W)
- Projected Gradient Descent (PGD)
- Conjugate Gradient (CG)
Based on the findings in our survey, the following attack methods are highlighted:
-
C&W (Carlini & Wagner): Consistently achieves high success rates with minimal perturbations, particularly for targeted attacks, but incurs high computational overhead.
# Example for a strong targeted L2 attack (parameters from paper/tests): # Assumes test.JPEG exists at the specified path python demo.py -i path/to/your/test.JPEG -a cw -n L2 -t -tm least-likely -cv 10 -k 5 -s 500 -lr 0.01 -o results/demo_cw_targeted # Example for a default untargeted L2 attack: # python demo.py -i path/to/your/test.JPEG -a cw -n L2 -cv 1 -k 0 -s 1000 -lr 0.01 -o results/demo_cw_untargeted
-
PGD (Projected Gradient Descent): Offers a strong balance between attack effectiveness and computational cost. Effective for both untargeted and targeted attacks under various L_p norms.
# Example for Linf Untargeted (epsilon=8/255, 40 steps, step_size=eps/4): python demo.py -i path/to/your/test.JPEG -a pgd -n Linf -e 0.03137 -s 40 -ss 0.00784 -o results/demo_pgd_linf_untargeted # Example for Linf Targeted (epsilon=16/255, 200 steps, step_size=eps/10): # python demo.py -i path/to/your/test.JPEG -a pgd -n Linf -t -tm least-likely -e 0.06274 -s 200 -ss 0.00627 -o results/demo_pgd_linf_targeted
-
DeepFool: Particularly effective for generating untargeted attacks with very small L2 perturbations, though computationally more intensive than FGSM or PGD. (DeepFool is generally untargeted).
# Example (L2 norm is implicit for DeepFool): python demo.py -i path/to/your/test.JPEG -a deepfool -s 50 -os 0.02 -o results/demo_deepfool -
CG (Conjugate Gradient): Can be more efficient than PGD on certain complex loss landscapes by utilizing approximate second-order information, offering a balance between cost and potency.
# Example for Linf Untargeted (epsilon=8/255, 40 steps): python demo.py -i path/to/your/test.JPEG -a cg -n Linf -e 0.03137 -s 40 -al 0.00784 -o results/demo_cg_linf_untargeted # Example for Linf Targeted (epsilon=16/255, 60 steps from paper): # python demo.py -i path/to/your/test.JPEG -a cg -n Linf -t -tm least-likely -e 0.06274 -s 60 -al 0.00627 -o results/demo_cg_linf_targeted
-
FGSM/FFGSM: The fastest methods, suitable for scenarios requiring rapid generation (e.g., adversarial training). Less effective for targeted attacks and against robust models. FFGSM adds a small random initialization to potentially improve FGSM.
# Example FGSM (Untargeted, Linf, epsilon=4/255): python demo.py -i path/to/your/test.JPEG -a fgsm -n Linf -e 0.01568 -o results/demo_fgsm_linf_untargeted # Example FFGSM (Untargeted, Linf, epsilon=8/255, alpha=0.1*epsilon): python demo.py -i path/to/your/test.JPEG -a ffgsm -n Linf -e 0.03137 -al 0.00313 -o results/demo_ffgsm_linf_untargeted # Example FFGSM (Targeted, Linf, epsilon=32/255, alpha=0.02 from paper): # python demo.py -i path/to/your/test.JPEG -a ffgsm -n Linf -t -tm least-likely -e 0.12549 -al 0.02 -o results/demo_ffgsm_linf_targeted
adversarial-attacks/
├── data/ # Directory for dataset storage
├── results/ # Experimental results
├── src/ # Source code
│ ├── attacks/ # Attack implementations
│ │ ├── base.py # Base attack class
│ │ ├── fgsm.py # Fast Gradient Sign Method
│ │ ├── ffgsm.py # Fast FGSM
│ │ ├── deepfool.py # DeepFool
│ │ ├── cw.py # Carlini & Wagner
│ │ ├── pgd.py # Projected Gradient Descent
│ │ └── cg.py # Conjugate Graident Method
│ ├── models/ # Model wrappers
│ ├── plot/ # Visualization tools
│ └── utils/ # Utility functions
│ ├── data.py # Data loading utilities
│ ├── evaluation.py # Evaluation metrics
│ ├── metrics.py # Performance metrics
│ └── projections.py # Projection operations
├── requirements.txt # Dependencies
└── README.md # This file
# Clone the repository
git clone https://github.com/ali-izhar/adversarial-attacks.git
cd adversarial-attacks
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Compare optimization methods
python experiments/compare_optimizers.pyYou can extend the base attack class to implement your own optimization methods:
from src.attacks.base import BaseAttack
class MyAttack(BaseAttack):
def __init__(self, model, **kwargs):
super().__init__(model, **kwargs)
def generate(self, images, labels):
# Implement your attack here
passWe evaluate each optimization method using criteria discussed in the paper:
- Attack Effectiveness: Percentage of inputs successfully misclassified (Success Rate).
-
Perturbation Efficiency: Measured by
$L_2$ norm,$L_\infty$ norm, and Structural Similarity Index (SSIM). - Computational Efficiency: Assessed via average iterations to convergence, total gradient computations, and wall-clock runtime per successful attack.
If you use this code or refer to the findings in your research, please cite our paper:
@article{ali2025survey,
title={Adversarial Attacks on Neural Networks: A Survey},
author={Ali, Izhar},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2025}
}
This project is licensed under the MIT License - see the LICENSE file for details.


