The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline (ICML 2024)
🚀 [2024-06-13]: ICML 2024 Oral Presentation!
🔥 [2024-07-18]: Code Released!
If you find our project helpful, please star our repo on GitHub ⭐ to stay updated with our latest features and improvements!
-
Release the SilentBadDiffusion Code:
- Make the SilentBadDiffusion code publicly available.
-
Detailed Step-by-Step Instructions:
- Provide comprehensive, step-by-step instructions for setting up and running all aspects of the project to ensure easy reproducibility.
-
Result Collector:
- Develop a script to automatically collect and organize all experimental results for easier analysis and comparison.
-
t-SNE Visualization:
- Include a module to perform t-SNE visualization of the results, making it simpler to reproduce the visualizations presented in the paper.
-
Extended Documentation:
- Expand the documentation to cover advanced usage scenarios and troubleshooting tips.
-
Performance Metrics:
- Implement additional metrics to evaluate the model performance more thoroughly.
The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infringement without requiring access to or control over training processes.
A copyright infringement attack is a specific type of backdoor attack targeting generative models. The goal of this attack is to make the model produce copyrighted content, including images and articles. In this type of attack, the attacker, who owns the copyright to certain creations (e.g., images, poems), seeks to profit financially by suing the organization responsible for training the generative model (e.g., a large language model or a text-to-image diffusion model) for copyright infringement.
Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset. Our experiments show the stealth and efficacy of the poisoning data. When given specific text prompts, DMs trained with a poisoning ratio of 0.20% can produce copyrighted images. Additionally, the results reveal that the more sophisticated the DMs are, the easier the success of the attack becomes.
These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny to prevent the misuse of DMs.
-
Install required packages:
pip install xformers==0.0.23 torchvision==0.16.1 pip install -r requirements.txt
-
Clone the Grounded-Segment-Anything repository and follow the installation instructions:
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
Alternatively, you can try the following steps:
cd Grounded-Segment-Anything export AM_I_DOCKER=False export BUILD_WITH_CUDA=True export CUDA_HOME=/path/to/cuda-11.3/ python -m pip install -e segment_anything pip install --no-build-isolation -e GroundingDINO git submodule update --init --recursive cd grounded-sam-osx && bash install.sh pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel
-
Download the required checkpoints into the
checkpoints
folder:mkdir checkpoints cd checkpoints wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth wget https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_mixup.torchscript.pt wget https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_large.torchscript.pt wget https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_imagenet_mixup.torchscript.pt
-
Set your OpenAI API key:
export OPENAI_API_KEY='yourkey'
-
Download the Datasets:
- Run
download.py
located in the./datasets
folder to download necessary datasets.
- Run
-
Generate Poisoning Data:
- Execute
src/poisoning_data_generation.py
to create the poisoning data required for the experiment.
- Execute
-
Run the Attack Experiment:
- Use
src/target_model_training.py
to carry out the attack experiment. - Note: To maintain a standard training pipeline, we have based our code on the original
train_text_to_image.py
from diffusers 0.27.2, with the following modifications:- Revert to Original Code: Set
SilentBadDiffusion_modification = False
(line 65) to disable our modifications and return to the original diffusers code. - Added Code Snippets:
- Loading Data (Lines 490-527): Additional code for loading data.
- Visualization (Lines 828-840): Added visualization steps.
- Saving Model (Lines 870-893): Code for saving the trained model.
- Revert to Original Code: Set
- Use
These steps will guide you through downloading the datasets, generating the necessary poisoning data, and running the attack experiment with the modified training pipeline.
- Haonan Wang: haonan.wang@u.nus.edu
BibTeX:
@article{wang2024stronger,
title={The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline},
author={Wang, Haonan and Shen, Qianli and Tong, Yao and Zhang, Yang and Kawaguchi, Kenji},
journal={arXiv preprint arXiv:2401.04136},
year={2024}
}