Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always allow ref_model=None #2047

Open
qgallouedec opened this issue Sep 10, 2024 · 5 comments
Open

Always allow ref_model=None #2047

qgallouedec opened this issue Sep 10, 2024 · 5 comments
Labels
🏋 DPO Related to DPO ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute

Comments

@qgallouedec
Copy link
Member

qgallouedec commented Sep 10, 2024

Feature request

For optimisation with reference model, in most cases the reference model is the same as the trained model. We should allow the user to specify the ref model only when they don't want to use the trained model.

Currently this is possible, but only when using PEFT, which is very counter-intuitive. And even using this situation, if you want to provide a ref model that is different from the trained model, you have to define force_use_model. Even more counter-intuitive.

Currently

  1. model = ref_model and no peft
DPOTrainer(model=model, ref_model= ref_model)  # where ref_model should be another instance
  1. model = ref_model and peft
DPOTrainer(model=model)
  1. model != ref_model and no peft
DPOTrainer(model=model, ref_model=ref_model)
  1. model != ref_model and peft
args = DPOConfig(force_use_ref_model=True)
DPOTrainer(model=model, ref_model=ref_model, args=args)

Proposed

  1. model = ref_model
DPOTrainer(model=model)
  1. model != ref_model
DPOTrainer(model=model, ref_model=ref_model)

Motivation

Make the lib use more intuitive.

Your contribution

For sure ;)

@RylanSchaeffer
Copy link
Contributor

RylanSchaeffer commented Sep 10, 2024

Quoting from the other issue:

However, handling ref_model/model is pretty tricky currently, maybe wait until #2047 is solved?

Is there an explanation for why ref_model and model are tricky? If I was to work on this, should I be wary of any challenges that might pop up?

@qgallouedec
Copy link
Member Author

I believe that this may be due to the implementation being carried out in multiple stages: first the initial version, followed by PEFT support, then integration with DeepSpeed... It's probably a good time to re-think it as a whole.
However, we must be careful not to introduce any regressions or breaking changes : we must test all the parameter combinations.

@RylanSchaeffer
Copy link
Contributor

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

@qgallouedec qgallouedec changed the title Always allow ˋref_model=Noneˋ Always allow ref_model=None Sep 10, 2024
@qgallouedec
Copy link
Member Author

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

Perhaps you should give it a try. It's difficult to assess the changes involved.

@qgallouedec
Copy link
Member Author

Implemented for Online DPO in #2041. It can probably be taken as reference

@qgallouedec qgallouedec added 🙋 help from community wanted Open invitation for community members to contribute 🧒 good second issue Good for contributors with basic project familiarity 🏋 DPO Related to DPO labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute
Projects
None yet
Development

No branches or pull requests

2 participants