-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326
Comments
We've have an example script to train VLM with DPO here. Have you tried to run it with MiniCPM-V? |
Alright cool! Will try it out and provide an update, thanks for your response! |
Hi, I've tried to run the script with MiniCPM-v, but came across this error: (base) PS C:\Users\userAdmin\RLHF_V_MiniCPMV> accelerate launch dpo_vlm_2.py
Seems like it has something to do with the GenerationMixin, is there any way to solve this? Thanks. |
Feature request
Hi! I’d like to request support for reinforcement learning with DPO for the MiniCPM-V model. I'm not sure if the current state of this repository enables for this vision model to be retrained as well, could I get some advice / insights into that? Would the current approach for applying DPO to VLMs work for the majority of VLMs on HuggingFace?
Motivation
None
Your contribution
None
The text was updated successfully, but these errors were encountered: