-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preference optimization (Diffusion-DPO, MaPO) #1427
base: dev
Are you sure you want to change the base?
Conversation
Do you have any training samples from this? |
36ee5259 = pickapic dataset sample (500 preferences) 10 epochs, LoRA 16/16 with Prodigy d_coef at 1.25 MaPO with contribution weight of Papers generally suggesting around 1000 preferences. I have been making a preference creation tool so one could make their own preferences on their own dataset. |
I'm finding I need the learning rate as low as 1e-6 (for an SDXL lora), possibly lower if you have a bigger dataset. I also had text encoder training disabled. One thing I want to try is using real chosen plus AI generated rejected images, inspired by what LLM folks have been doing to bypass the need for collecting real preference pairs |
Which did you try Diffusion-DPO or MaPO? I found it to train slowly at 1e-4 on SD 1.5 with MaPO weight of 0.1. Haven't done a full hyperparameter test yet though. |
MaPO with LR 1e-6 and beta 0.1 on sdxl. My dataset consists of real images in the target style and each has a matching AI image without style prompts as the rejected image. The differences are extreme, so maybe that's why I need lower learning rates? |
the mapo_weight is described as the contribution which is basically the difference between the preference and non-preference. So you could adjust the weight to be like 0.05 and keep your original LR. I'm not sure what is the most efficient though. |
A few more observations with adamw and my real chosen, synthetic rejected dataset:
I haven't managed to completely eliminate the artifacts, so my only option is early stopping. I've seen this with other preference optimization papers (if you look, they all train for very short periods of around 2000 steps) and it's annoying that it's never addressed. |
I finally sat down and built a real preference dataset and found pairs of images generated by the same model don't cause as many artifacts (probably because they come from the same distribution and any encoding artifacts cancel out) |
Add preference optimization (PO) support
Add paired images in dataset.
Preference Optimization algo implemented:
Currently looking for feedback about implementation.
Decisions I made and why:
Pairing images
Pairing images in ImageSetInfo (exetnding ImageInfo)
Batch size of 1 will load 2 captions and 2 images for 1 requested image/caption pair
Dataset
Datasets can be defined as "preference"
args.preference
--preference
and in dataset_config.tomlTo create a pattern of dataset, we are hard coding in
dataset/1_name/w
anddataset/1_name/l
. You would then have a typical dreambooth-like dataset with the following.dataset/1_name/w/image.png
dataset/1_name/w/image.caption
dataset/1_name/l/image.png
dataset/1_name/l/image.caption
Note
w
andl
are like the typical dataset with image/caption file pairs. They all have the same file name to create the pairs.Good idea to consider other file dataset patterns.
Preference dataset examples:
Pickapic dataset is a preference between 2 images and showing the pairing and embedding the 2 images into the dataset.
Caption prefix/suffix for preference/non-preference
Prefix/suffix allow some techniques of moving away from some concepts. Allows different ones for preference/non preference, to give flexibility in experimentation.
Training
Added PO into the main training script to allow flexibility but will be moved to the typical functions for these. I have it setup for network training but would work other scripts.
Hyperparameters
--beta_dpo
= KL-divergence parameter beta for Diffusion-DPO2500
for 1.5,5000
for SDXL were what I have found suggested.--mapo_weight
= MaPO contribution factorStart around
0.1
but adjusting this can be helpful at how much the contribution of the preference optimization will have on the training. SeeTODO
Possible issues
Preference and regular training datasets mixed
This mixing would need to worked on at higher than 1 batch size. We assume chunking of pairs so unpaired images won't work that way.
The implementations may not be accurate
If you see something not correct, let me know.
Usage
State: This is currently working and producing favorable results.
Images/caption pairs stored in
w
andl
directories.NOTE Use the same name for images in
w
andl
directories to make them pairedor in your dataset config
Related tickets: #1040