GitHub

Welcome to Accidental Vulnerability!

Preliminaries:

Please follow HarmBench's installation manual to procure the adversarial evaluation techniques used in this project.
Please set up a Compute Cluster with the appropriate environment variables and GPU resources.

Folder and File Descriptions:

slurm: This folder contains scripts that can be migrated onto your cluster to generate test cases and adversarial evaluations. Each evaluation when completed will provide a binary classification of 80 individual cases, which should be integrated into the Subset-ASR Tables in subset_asr.py using prompt classifications from Appendix C in our paper.
- s1.sh: HarmBench (Step 1) SLURM script
- s1_5.sh: HarmBench (Step 1.5) SLURM script
- s2.sh: HarmBench (Step 2) SLURM script
- s3.sh: HarmBench (Step 3) SLURM script
correlational_analysis_visualizations: This folder contains the code that finds the correlations within our measured attack success rates and visualizations for our subset ASRs and figures.
- correlation_plot.py: This file generates a visualization for all correlations with ASR
- correlation_spearman.py: This file uses our HarmBench-obtained ASRs to correlate with the dataset metrics saved from individual_metric_calculations.py
- individual_metric_calculations.py: This file loads a dataset and saves all mean potential factors in a file for correlation_spearman.py
- subset_asr.py: This file uses our classification of HarmBench-obtained subset-specific ASRs to generate heatmaps for further visualization.
- top6_correlation_visualizations.py: This file provides visualizations of the top 6 correlated metrics across all datasets.
finetuning: This folder contains the sample script that loads a dataset and fine-tunes models using the hyperparameters mentioned in our paper.
- sample_script.py: Fine-tuning training script with hyperparameters
horizontal_evaluations: This folder contains the code for horizontal evaluations (newly added/appended)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
correlational_analysis_visualizations		correlational_analysis_visualizations
finetuning		finetuning
horizontal_evaluations		horizontal_evaluations
slurm		slurm
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

psyonp/accidental_vulnerability

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages