Skip to content

ali-izhar/adversarial-attacks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adversarial Attacks on Neural Networks: A Survey

adversarial_perturbations.pdf

This repository contains implementations of various optimization methods for generating adversarial examples against convolutional neural networks. It accompanies the research paper "Adversarial Attacks on Neural Networks: A Survey."

Below are some example visualizations from the paper, illustrating the effects of different adversarial attacks.

Figure 1: Untargeted vs. Targeted Attack Example (from paper Figure 1) (Illustrates untargeted vs. targeted adversarial attack outcomes on ResNet-18. Original image correctly classified as "saluki". Untargeted attack results in "beagle". Targeted attack forces classification to "gorilla".)

Untargeted vs Targeted Attack Example

Figure 2: Qualitative Comparison of Untargeted Attacks (from paper Figure 2) (Qualitative comparison of untargeted attacks on ResNet-18 generated by different methods for several input images. Original images are correctly classified. All attack methods induce misclassification to various incorrect labels.)

Qualitative Comparison of Untargeted Attacks

Figure 3: Grad-CAM Visualization of Model Attention (from paper Figure 6.1 / Figure 3 in PDF text) (Comparison of adversarial attacks and their effect on model attention using Grad-CAM. Illustrates how iterative methods successfully redirect model's attention for targeted attacks, while single-iteration methods struggle.)

Grad-CAM Visualization

Overview

Adversarial examples are carefully crafted perturbations that, when added to an input image, cause neural networks to misclassify the image while appearing visually imperceptible to humans. This project and the accompanying paper provide a systematic survey and empirical comparison of six foundational adversarial attack strategies:

  1. Fast Gradient Sign Method (FGSM)
  2. Fast FGSM (FFGSM)
  3. DeepFool
  4. Carlini & Wagner (C&W)
  5. Projected Gradient Descent (PGD)
  6. Conjugate Gradient (CG)

Recommended Attack Methods

Based on the findings in our survey, the following attack methods are highlighted:

  1. C&W (Carlini & Wagner): Consistently achieves high success rates with minimal perturbations, particularly for targeted attacks, but incurs high computational overhead.

    # Example for a strong targeted L2 attack (parameters from paper/tests):
    # Assumes test.JPEG exists at the specified path
    python demo.py -i path/to/your/test.JPEG -a cw -n L2 -t -tm least-likely -cv 10 -k 5 -s 500 -lr 0.01 -o results/demo_cw_targeted
    # Example for a default untargeted L2 attack:
    # python demo.py -i path/to/your/test.JPEG -a cw -n L2 -cv 1 -k 0 -s 1000 -lr 0.01 -o results/demo_cw_untargeted
  2. PGD (Projected Gradient Descent): Offers a strong balance between attack effectiveness and computational cost. Effective for both untargeted and targeted attacks under various L_p norms.

    # Example for Linf Untargeted (epsilon=8/255, 40 steps, step_size=eps/4):
    python demo.py -i path/to/your/test.JPEG -a pgd -n Linf -e 0.03137 -s 40 -ss 0.00784 -o results/demo_pgd_linf_untargeted
    # Example for Linf Targeted (epsilon=16/255, 200 steps, step_size=eps/10):
    # python demo.py -i path/to/your/test.JPEG -a pgd -n Linf -t -tm least-likely -e 0.06274 -s 200 -ss 0.00627 -o results/demo_pgd_linf_targeted
  3. DeepFool: Particularly effective for generating untargeted attacks with very small L2 perturbations, though computationally more intensive than FGSM or PGD. (DeepFool is generally untargeted).

    # Example (L2 norm is implicit for DeepFool):
    python demo.py -i path/to/your/test.JPEG -a deepfool -s 50 -os 0.02 -o results/demo_deepfool
  4. CG (Conjugate Gradient): Can be more efficient than PGD on certain complex loss landscapes by utilizing approximate second-order information, offering a balance between cost and potency.

    # Example for Linf Untargeted (epsilon=8/255, 40 steps):
    python demo.py -i path/to/your/test.JPEG -a cg -n Linf -e 0.03137 -s 40 -al 0.00784 -o results/demo_cg_linf_untargeted
    # Example for Linf Targeted (epsilon=16/255, 60 steps from paper):
    # python demo.py -i path/to/your/test.JPEG -a cg -n Linf -t -tm least-likely -e 0.06274 -s 60 -al 0.00627 -o results/demo_cg_linf_targeted
  5. FGSM/FFGSM: The fastest methods, suitable for scenarios requiring rapid generation (e.g., adversarial training). Less effective for targeted attacks and against robust models. FFGSM adds a small random initialization to potentially improve FGSM.

    # Example FGSM (Untargeted, Linf, epsilon=4/255):
    python demo.py -i path/to/your/test.JPEG -a fgsm -n Linf -e 0.01568 -o results/demo_fgsm_linf_untargeted
    # Example FFGSM (Untargeted, Linf, epsilon=8/255, alpha=0.1*epsilon):
    python demo.py -i path/to/your/test.JPEG -a ffgsm -n Linf -e 0.03137 -al 0.00313 -o results/demo_ffgsm_linf_untargeted
    # Example FFGSM (Targeted, Linf, epsilon=32/255, alpha=0.02 from paper):
    # python demo.py -i path/to/your/test.JPEG -a ffgsm -n Linf -t -tm least-likely -e 0.12549 -al 0.02 -o results/demo_ffgsm_linf_targeted

Repository Structure

adversarial-attacks/
├── data/                   # Directory for dataset storage
├── results/                # Experimental results
├── src/                    # Source code
│   ├── attacks/            # Attack implementations
│   │   ├── base.py         # Base attack class
│   │   ├── fgsm.py         # Fast Gradient Sign Method
│   │   ├── ffgsm.py        # Fast FGSM
│   │   ├── deepfool.py     # DeepFool
│   │   ├── cw.py           # Carlini & Wagner
│   │   ├── pgd.py          # Projected Gradient Descent
│   │   └── cg.py           # Conjugate Graident Method
│   ├── models/             # Model wrappers
│   ├── plot/               # Visualization tools
│   └── utils/              # Utility functions
│       ├── data.py         # Data loading utilities
│       ├── evaluation.py   # Evaluation metrics
│       ├── metrics.py      # Performance metrics
│       └── projections.py  # Projection operations
├── requirements.txt        # Dependencies
└── README.md               # This file

Installation

# Clone the repository
git clone https://github.com/ali-izhar/adversarial-attacks.git
cd adversarial-attacks

# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Usage

Running Experiments

# Compare optimization methods
python experiments/compare_optimizers.py

Implementing Your Own Attacks

You can extend the base attack class to implement your own optimization methods:

from src.attacks.base import BaseAttack

class MyAttack(BaseAttack):
    def __init__(self, model, **kwargs):
        super().__init__(model, **kwargs)
        
    def generate(self, images, labels):
        # Implement your attack here
        pass

Evaluation Metrics

We evaluate each optimization method using criteria discussed in the paper:

  • Attack Effectiveness: Percentage of inputs successfully misclassified (Success Rate).
  • Perturbation Efficiency: Measured by $L_2$ norm, $L_\infty$ norm, and Structural Similarity Index (SSIM).
  • Computational Efficiency: Assessed via average iterations to convergence, total gradient computations, and wall-clock runtime per successful attack.

Citation

If you use this code or refer to the findings in your research, please cite our paper:

@article{ali2025survey,
  title={Adversarial Attacks on Neural Networks: A Survey},
  author={Ali, Izhar},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A comparative study of optimization methods for adversarial attacks on neural networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages