This repository contains the main research code for our paper "On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling" - S. Wu, R. Bhaskar, A. Ha, S. Shan, H. Zheng, B. Zhao (accepted to CCS 2025)
The figure above (Figure 1 in the paper) illustrates the full pipeline of our poisoning attack on text-to-image diffusion models. This repository contains code we used for parts A and B as followed:
-
[Code for Part A.] Generating adversarial images that fool VLMs (
./adversarial_mislabeling_attack)- We include our targeted white-box attack (section 5 in the paper) against all three VLMs we evaluated against (CogVLM, xGen-MM, and LLaVA)
- Setup and usage can be found in its own README
-
[Code for Part B.] Fine-tuning text-to-image models (
./fine_tuning)- We include our fine-tuning scripts for all three text-to-image models we evaluated against (SD21, SDXL, and FLUX)
- Setup and usage can be found in its own README
-
Miscelaneous implementations of our project (
./misc)- Evaluation metrics (
./misc/metrics) - Concept selection from section 4.2 in the paper (
./misc/concept_selection) - Setup and usage can be found in its own README
- Evaluation metrics (
@inproceedings{wu2025amp,
title={On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling},
author={Wu, Stanley and Bhaskar, Ronik and Ha, Anna Yoo Jeong and Shan, Shawn and Zheng, Haitao and Zhao, Ben Y},
booktitle={ACM SIGSAC Conference on Computer and Communications Security},
year={2025},
}
For any questions, please email stanleywu@cs.uchicago.edu.