Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

DarioPTWR · 2024-11-05T04:14:27Z

Feature request

Hi! I’d like to request support for reinforcement learning with DPO for the MiniCPM-V model. I'm not sure if the current state of this repository enables for this vision model to be retrained as well, could I get some advice / insights into that? Would the current approach for applying DPO to VLMs work for the majority of VLMs on HuggingFace?

Motivation

None

Your contribution

None

qgallouedec · 2024-11-05T17:58:27Z

We've have an example script to train VLM with DPO here. Have you tried to run it with MiniCPM-V?
At present, we're not claiming that you can use it with any VLM, as the level of standardization of VLMs is lower than that of LLMs. But it's definitely worth giving this one a try.

DarioPTWR · 2024-11-11T06:56:47Z

Alright cool! Will try it out and provide an update, thanks for your response!

DarioPTWR · 2024-11-27T07:58:43Z

Hi, I've tried to run the script with MiniCPM-v, but came across this error:

(base) PS C:\Users\userAdmin\RLHF_V_MiniCPMV> accelerate launch dpo_vlm_2.py
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 0
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
MiniCPMForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6.69it/s]
Traceback (most recent call last):
File "C:\Users\userAdmin\RLHF_V_MiniCPMV\dpo_vlm_2.py", line 78, in
main()
File "C:\Users\userAdmin\RLHF_V_MiniCPMV\dpo_vlm_2.py", line 66, in main
trainer = DPOTrainer(
^^^^^^^^^^^
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\huggingface_hub\utils_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\utils\deprecation.py", line 165, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\trl\trainer\dpo_trainer.py", line 367, in init
model.enable_input_require_grads()
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 18 self File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
raise NotImplementedError
NotImplementedError
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
raise NotImplementedError
et_input_embeddings
raise NotImplementedError
raise NotImplementedError
NotImplementedError
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Scripts\accelerate.exe_main.py", line 7, in
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command
simple_launcher(args)
File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['c:\Users\userAdmin\RLHF_V_MiniCPMV\.venv\Scripts\python.exe', 'dpo_vlm_2.py']' returned non-zero exit status 1.

Seems like it has something to do with the GenerationMixin, is there any way to solve this? Thanks.

qgallouedec added ❓ question Seeking clarification or more information 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models labels Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

DarioPTWR commented Nov 5, 2024

qgallouedec commented Nov 5, 2024

DarioPTWR commented Nov 11, 2024

DarioPTWR commented Nov 27, 2024

Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

Comments

DarioPTWR commented Nov 5, 2024

Feature request

Motivation

Your contribution

qgallouedec commented Nov 5, 2024

DarioPTWR commented Nov 11, 2024

DarioPTWR commented Nov 27, 2024