Always allow `ref_model=None` #2047

qgallouedec · 2024-09-10T07:01:59Z

Feature request

For optimisation with reference model, in most cases the reference model is the same as the trained model. We should allow the user to specify the ref model only when they don't want to use the trained model.

Currently this is possible, but only when using PEFT, which is very counter-intuitive. And even using this situation, if you want to provide a ref model that is different from the trained model, you have to define force_use_model. Even more counter-intuitive.

Currently

model = ref_model and no peft

DPOTrainer(model=model, ref_model= ref_model)  # where ref_model should be another instance

model = ref_model and peft

DPOTrainer(model=model)

model != ref_model and no peft

DPOTrainer(model=model, ref_model=ref_model)

model != ref_model and peft

args = DPOConfig(force_use_ref_model=True)
DPOTrainer(model=model, ref_model=ref_model, args=args)

Proposed

model = ref_model

DPOTrainer(model=model)

model != ref_model

DPOTrainer(model=model, ref_model=ref_model)

Motivation

Make the lib use more intuitive.

Your contribution

For sure ;)

RylanSchaeffer · 2024-09-10T14:07:12Z

Quoting from the other issue:

However, handling ref_model/model is pretty tricky currently, maybe wait until #2047 is solved?

Is there an explanation for why ref_model and model are tricky? If I was to work on this, should I be wary of any challenges that might pop up?

qgallouedec · 2024-09-10T15:11:29Z

I believe that this may be due to the implementation being carried out in multiple stages: first the initial version, followed by PEFT support, then integration with DeepSpeed... It's probably a good time to re-think it as a whole.
However, we must be careful not to introduce any regressions or breaking changes : we must test all the parameter combinations.

RylanSchaeffer · 2024-09-10T15:12:56Z

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

qgallouedec · 2024-09-11T18:14:58Z

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

Perhaps you should give it a try. It's difficult to assess the changes involved.

qgallouedec · 2024-09-13T08:21:25Z

Implemented for Online DPO in #2041. It can probably be taken as reference

qgallouedec added the ✨ enhancement New feature or request label Sep 10, 2024

qgallouedec mentioned this issue Sep 10, 2024

PPOv2Trainer & RLOOTrainer - Add Safety Check that policy object != ref_policy object #2046

Closed

qgallouedec changed the title ~~Always allow ˋref_model=Noneˋ~~ Always allow ref_model=None Sep 10, 2024

qgallouedec added 🙋 help from community wanted Open invitation for community members to contribute 🧒 good second issue Good for contributors with basic project familiarity 🏋 DPO Related to DPO labels Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always allow `ref_model=None` #2047

Always allow `ref_model=None` #2047

qgallouedec commented Sep 10, 2024 •

edited

Loading

RylanSchaeffer commented Sep 10, 2024 •

edited

Loading

qgallouedec commented Sep 10, 2024

RylanSchaeffer commented Sep 10, 2024

qgallouedec commented Sep 11, 2024

qgallouedec commented Sep 13, 2024

Always allow ref_model=None #2047

Always allow ref_model=None #2047

Comments

qgallouedec commented Sep 10, 2024 • edited Loading

Feature request

Currently

Proposed

Motivation

Your contribution

RylanSchaeffer commented Sep 10, 2024 • edited Loading

qgallouedec commented Sep 10, 2024

RylanSchaeffer commented Sep 10, 2024

qgallouedec commented Sep 11, 2024

qgallouedec commented Sep 13, 2024

Always allow `ref_model=None` #2047

Always allow `ref_model=None` #2047

qgallouedec commented Sep 10, 2024 •

edited

Loading

RylanSchaeffer commented Sep 10, 2024 •

edited

Loading