Welcome to Accidental Vulnerability!
- Preliminaries:
- Please follow HarmBench's installation manual to procure the adversarial evaluation techniques used in this project.
- Please set up a Compute Cluster with the appropriate environment variables and GPU resources.
- Folder and File Descriptions:
-
slurm: This folder contains scripts that can be migrated onto your cluster to generate test cases and adversarial evaluations. Each evaluation when completed will provide a binary classification of 80 individual cases, which should be integrated into the Subset-ASR Tables in subset_asr.py using prompt classifications from Appendix C in our paper.
- s1.sh: HarmBench (Step 1) SLURM script
- s1_5.sh: HarmBench (Step 1.5) SLURM script
- s2.sh: HarmBench (Step 2) SLURM script
- s3.sh: HarmBench (Step 3) SLURM script
-
correlational_analysis_visualizations: This folder contains the code that finds the correlations within our measured attack success rates and visualizations for our subset ASRs and figures.
- correlation_plot.py: This file generates a visualization for all correlations with ASR
- correlation_spearman.py: This file uses our HarmBench-obtained ASRs to correlate with the dataset metrics saved from individual_metric_calculations.py
- individual_metric_calculations.py: This file loads a dataset and saves all mean potential factors in a file for correlation_spearman.py
- subset_asr.py: This file uses our classification of HarmBench-obtained subset-specific ASRs to generate heatmaps for further visualization.
- top6_correlation_visualizations.py: This file provides visualizations of the top 6 correlated metrics across all datasets.
-
finetuning: This folder contains the sample script that loads a dataset and fine-tunes models using the hyperparameters mentioned in our paper.
- sample_script.py: Fine-tuning training script with hyperparameters
-
horizontal_evaluations: This folder contains the code for horizontal evaluations (newly added/appended)