Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for MiniCPM-V Reinforcement Learning with Direct Preference Optimization (DPO) #2326

Open
DarioPTWR opened this issue Nov 5, 2024 · 3 comments
Labels
🏋 DPO Related to DPO ❓ question Seeking clarification or more information 👁️ VLM Related to Visual Language Models

Comments

@DarioPTWR
Copy link

Feature request

Hi! I’d like to request support for reinforcement learning with DPO for the MiniCPM-V model. I'm not sure if the current state of this repository enables for this vision model to be retrained as well, could I get some advice / insights into that? Would the current approach for applying DPO to VLMs work for the majority of VLMs on HuggingFace?

Motivation

None

Your contribution

None

@qgallouedec
Copy link
Member

We've have an example script to train VLM with DPO here. Have you tried to run it with MiniCPM-V?
At present, we're not claiming that you can use it with any VLM, as the level of standardization of VLMs is lower than that of LLMs. But it's definitely worth giving this one a try.

@qgallouedec qgallouedec added ❓ question Seeking clarification or more information 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models labels Nov 5, 2024
@DarioPTWR
Copy link
Author

Alright cool! Will try it out and provide an update, thanks for your response!

@DarioPTWR
Copy link
Author

Hi, I've tried to run the script with MiniCPM-v, but came across this error:

(base) PS C:\Users\userAdmin\RLHF_V_MiniCPMV> accelerate launch dpo_vlm_2.py
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 0
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
MiniCPMForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

  • If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  • If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
  • If you are not the owner of the model architecture class, please contact the model code owner to update it.
    Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6.69it/s]
    Traceback (most recent call last):
    File "C:\Users\userAdmin\RLHF_V_MiniCPMV\dpo_vlm_2.py", line 78, in
    main()
    File "C:\Users\userAdmin\RLHF_V_MiniCPMV\dpo_vlm_2.py", line 66, in main
    trainer = DPOTrainer(
    ^^^^^^^^^^^
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\huggingface_hub\utils_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\utils\deprecation.py", line 165, in wrapped_func
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\trl\trainer\dpo_trainer.py", line 367, in init
    model.enable_input_require_grads()
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 18 self File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
    raise NotImplementedError
    NotImplementedError
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\transformers\modeling_utils.py", line 1873, in get_input_embeddings
    raise NotImplementedError
    et_input_embeddings
    raise NotImplementedError
    raise NotImplementedError
    NotImplementedError
    Traceback (most recent call last):
    File "", line 198, in _run_module_as_main
    File "", line 88, in run_code
    File "C:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Scripts\accelerate.exe_main
    .py", line 7, in
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
    args.func(args)
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\launch.py", line 1168, in launch_command
    simple_launcher(args)
    File "c:\Users\userAdmin\RLHF_V_MiniCPMV.venv\Lib\site-packages\accelerate\commands\launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command '['c:\Users\userAdmin\RLHF_V_MiniCPMV\.venv\Scripts\python.exe', 'dpo_vlm_2.py']' returned non-zero exit status 1.

Seems like it has something to do with the GenerationMixin, is there any way to solve this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO ❓ question Seeking clarification or more information 👁️ VLM Related to Visual Language Models
Projects
None yet
Development

No branches or pull requests

2 participants