Parameter-efficient fine-tuning of Stable Diffusion using (IA)^3.
Before Fine-Tuning | After Fine-Tuning |
---|---|
The prompt is "donald trump", and the model is fine-tuned on pokemon-blip-captions for 25 epochs.
Based on these papers:
- (IA)^3: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
- Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models
Implemented in diffusers using an attention processor in attention.py
.
(IA)^3 has trade-offs similar to LoRA when comparing to full fine-tuning.
One major difference to LoRA is that (IA)^3 uses much less parameters. In general, it will most likely be faster and smaller, but less expressive.
- Faster training
- Smaller file size (~222 KB for Stable Diffusion 1.5 when
learn_biases=False
, about twice as much otherwise) - Can be swapped in and out of the base model during inference
- Can be loaded into fine-tuned models that have the same architecture
- Can be merged with the weights of the base model
- Only possible when
learn_biases=False
without changing the architecture - Not currently implemented in this repo
- Only possible when
First create an environment and install PyTorch.
Then install the pip dependencies:
pip install -r requirements.txt
Currently, bitsandbytes only supports Linux, so fine-tuning on Windows requires more VRAM.
Training script in train.py
. Based on this example script for diffusers.
Currently you can change the parameters by editing the variables at the top of the file and running the script:
python train.py
Inference script in infer.py
to load the changes and generate images.
Currently you can change the parameters by editing the variables at the top of the file and running the script:
python infer.py