From scratch implementation of the Fast Gradient Sign Method (FGSM) adversarial attack on a pretrained image classifier. Demonstrates how invisible pixel-level perturbations can fool a neural network while remaining imperceptible to humans.
Neural networks don't see images the way humans do instead, they see grids of numbers. By nudging those numbers in a mathematically precise direction, you can cause a model to misclassify an image with high confidence while the image looks completely unchanged to a human eye.
This project implements FGSM from scratch using PyTorch, without relying on attack libraries to understand the mechanism at the implementation level.
- Feed an image to the model and get its prediction
- Calculate how wrong you can make the model by changing each pixel
- Nudge every pixel by epsilon in the direction that maximally increases the model's error
- The result looks identical to humans but completely fools the model
Entire attack is one line mathematically: perturbation = epsilon × sign(∇loss)
- Original: Labrador Retriever (35.6% confidence)
- Adversarial: Treeing Walker Coonhound (11.4% confidence)
- Epsilon: 0.01: 1% pixel change, invisible to humans
attack.py - loads ResNet50, implements FGSM from scratch
visualize.py — converts tensors back to images, displays side by side
main.py — entry point, accepts any image as input
python3 -m venv venv && source venv/bin/activate
pip install torch torchvision matplotlib numpy Pillow
python main.py cat.jpg
python main.py dog.jpgEvasion attacks using adversarial examples poses real risks to deployed AI systems. A self driving car misreading a stop sign, a medical imaging system misclassifying a scan, a security camera fooled by a printed patch. Understanding how these attacks work at the implementation level is the first step toward building defenses against them.
