Skip to content

sabrinahaniff/adversarial-image-attacks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adversarial Image Attacks (FGSM)

From scratch implementation of the Fast Gradient Sign Method (FGSM) adversarial attack on a pretrained image classifier. Demonstrates how invisible pixel-level perturbations can fool a neural network while remaining imperceptible to humans.

The Core Idea

Neural networks don't see images the way humans do instead, they see grids of numbers. By nudging those numbers in a mathematically precise direction, you can cause a model to misclassify an image with high confidence while the image looks completely unchanged to a human eye.

This project implements FGSM from scratch using PyTorch, without relying on attack libraries to understand the mechanism at the implementation level.

How FGSM Works

  1. Feed an image to the model and get its prediction
  2. Calculate how wrong you can make the model by changing each pixel
  3. Nudge every pixel by epsilon in the direction that maximally increases the model's error
  4. The result looks identical to humans but completely fools the model

Entire attack is one line mathematically: perturbation = epsilon × sign(∇loss)

Results

Results

  • Original: Labrador Retriever (35.6% confidence)
  • Adversarial: Treeing Walker Coonhound (11.4% confidence)
  • Epsilon: 0.01: 1% pixel change, invisible to humans

Architecture

attack.py      - loads ResNet50, implements FGSM from scratch
visualize.py   — converts tensors back to images, displays side by side
main.py        — entry point, accepts any image as input

Usage

python3 -m venv venv && source venv/bin/activate
pip install torch torchvision matplotlib numpy Pillow
python main.py cat.jpg
python main.py dog.jpg

Why This Matters

Evasion attacks using adversarial examples poses real risks to deployed AI systems. A self driving car misreading a stop sign, a medical imaging system misclassifying a scan, a security camera fooled by a printed patch. Understanding how these attacks work at the implementation level is the first step toward building defenses against them.

Releases

No releases published

Packages

 
 
 

Contributors

Languages