Skip to content

nanduruganesh/rad-attack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAD Attack

We develop 4 white-box adversarial attack algorithms causing Meta AI's LLaVA to misclassify MNIST and CIFAR-10 images. We develop two primary loss functions that we optimize with gradient descent until the model misclassifies the image. We develop regularized versions of these loss functions that add a weighted term for the deviation of the perturbed image. This repository contains scripts attacking Meta AI's LLaVA model, tuning attack hyperparameters, comparing different attacks, demonstrating transfer attack on Salesforce's BLIP model, and finetuning LLaVA with adversarially perturbed images.

Setup

We use Hugging Face to download our models, and Hugging Face and torchvision to download MNIST and CIFAR. The downloads will require ~16 GB of memory.

The scripts run with limited speed/effectiveness on CPU, so it is highly recommended that CUDA is installed and a GPU is available during runtime.

To activate our environment:

export HF_TOKEN="your_hugging_face_api_token"
source env.sh

Scripts

  • attack.py: Runs the default (no regularization) gradient descent adversarial attack on LLaVA with MNIST image and saves perturbed image to results/

  • attack_cifar.py: Runs default attack on LLaVA with image from CIFAR-10 dataset

  • benchmark_llava.py: runs a simple validation script that passes all 10k of MNIST test images through LLaVA and verifies its output. We observe 88% accuracy.

  • tune_alpha.py: Runs a grid search to tune the hyperparameter associated with the regularizer term in attacks 3 & 4 and outputs summary statistics in results/tune_alpha-{id}.csv

  • compare_attacks.py: Runs an attack on a subset of MNIST and outputs summary statistics in results/compare_attacks-{id}.csv

  • analysis.py: Analyzes and graphs the results of the attack comparison and hyperparameter tuning

  • generate_rad_dataset.py: Generates a dataset of adversarially perturbed MNIST images and stores the images as tensors in rad_data/tensors-{id}/

  • training.py: Finetunes LLaVA on MNIST images and/or perturbed MNIST images and outputs results to results/{rad or reg}-train-{id}.csv

Slurm Scripts

We develop a number of slurm scripts to run our experiments on the University of Virginia's Rivanna HPC.

Write-Up

Our report is uploaded to the repository in report.pdf

About

Regularized Adversarial White-box Attacks on Vision Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •