-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FetureRequest]Add DPO train #1040
Comments
1、load lora
2、# Final loss.
3、save lora
|
In Kohya |
Some users like me reporting some errors with the trainer... I dont have time to test more. Plan to back to test this weekend. |
The DPO loss requires predictions from the reference model, so we also need a way to toggle LoRA modules during training. I've been faking it with |
I think it is good to add some toggle api in network module |
Would like to discuss how this improvement would look. Dataset:Pairing images together with their preferences. Also possibly supported 4 image preference data. As an example this dataset kashif/pickascore has images in pairs with preference (+/-/=) or winner/loser/even or preferred/not-preferred/no preference.
List out image pairs and preferences.
Captions and other dataset features can be declared the same way as dreambooth dataset. This allows the dataset to remain the same but we can set the preferences in this separate file. Loading the dataset would required aligning these image pairs together so a complete preference file would be needed.
Complete a full metadata file including preference data. This would align well since this JSON/dataset file would need to be made and would put it all together. Would probably want to collate the images into pairs and have this be worked through separately in the fine-tuning dataset vs having each image separately listed and trying to pair their after the fact. edit For each dataset, having a A/B or W/L subset using identical naming.
Alternatively
TrainingIn training we have pairs or up to 4 preferences (if desired to go down the 4 preference path). We calculate the loss like normal and then separate the losses into the appropriate pairings.
Then we calculate the loss on the model without the LoRA.
Then apply the loss
So to get this working, being able to run on the original model without the network applied, or bypassing the network. This has been stated above about how to do that, as well as modifications to make this more streamlined. If anyone has input about the following, or progress they have been making towards these goals. We can then collaborate and bring together the necessary features to have this be a possibility. Separately would be interesting to discuss separate preference dataset storage formatting. This would allow extensions and such to create datasets for use in training, and possibly formatting them to work well together. Using the pickascore dataset as a possible option but willing to propose different options. The pickascore dataset stores the individual images as well as their preference, prompt and other metadata so it is all inclusive. Personally not sure what is the best option, but I have explored what could be possible here. I am looking to add preference workflows into different tools so would like to consider what others would like to have as a dataset format or concerns with the pickascore like dataset. Thanks for reading. |
Now SOTA will be SPO |
SPO looks interesting but they have a model evaluator that hasn't yet described the architecture
Relevant papers related to preference optimization. DDPO I think having the preference dataset will allow us the flexibility at implementing these different optimization paths. |
New paper related to ORPO MaPO |
source:
https://github.com/huggingface/diffusers/blob/main/examples/research_projects/diffusion_dpo
The text was updated successfully, but these errors were encountered: