[DPO] Adding weighted preference optimization (WPO) #2141

gaetanlop · 2024-09-29T19:38:06Z

What does this PR do?

Adding WPO to the DPOTrainer. The paper is in the list of accepted papers of EMNLP 2024 and introduces an elegant method to similuate on-policy data while training on off policy preference pairs. It works by prioritizing the most probable samples under the current policy during optimization.

Implementation wise, it is independent from the loss function being used as it only provides a weighting for each sample pair during loss computation. My implementation is based on the authors implemation in https://github.com/wzhouad/WPO/blob/main/scripts/run_wpo.py

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you write any new necessary tests?

Who can review?

@kashif

HuggingFaceDocBuilderDev · 2024-09-30T06:23:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2024-09-30T07:24:54Z

@gaetanlop tests for visual-llm models are failing i believe due to an extra argument?

gaetanlop · 2024-09-30T19:43:35Z

@kashif my bad, should be fixed now

gaetanlop · 2024-10-01T14:01:39Z

Hey @kashif, I don’t think the failing test is related to the PR. Am I right?

qgallouedec · 2024-10-01T14:21:25Z

@gaetanlop tests for visual-llm models are failing i believe due to an extra argument?

No, see #2147 (comment)

qgallouedec · 2024-10-01T14:23:58Z

Please don't merge before #2131, it introduces some modification in the documentation, that I'd like this PR to take into account. I'll review this one in details when #2131 is merged

gaetanlop · 2024-10-03T04:01:30Z

Looks like #2131 has been merged. I will check if there are any modifications required.

gaetanlop · 2024-10-08T17:46:58Z

Thanks for the great doc @kashif, after checking #2131, it looks like it doesn't require any updates on this PR

gaetanlop added 5 commits September 29, 2024 10:42

skeleton

aa42338

add weighting arg in config

a793436

formatting

5bcd60c

fix doc

3828296

do not compute gradients in weighting term

18c258f

kashif approved these changes Sep 30, 2024

View reviewed changes

fixed detach

4d0162b

kashif added 2 commits October 1, 2024 10:05

Merge branch 'main' into wpo

84269e0

Merge branch 'main' into wpo

e3f9a75

kashif added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Oct 6, 2024

kashif added 5 commits October 7, 2024 10:22

Merge branch 'main' into wpo

60065eb

add WPO doc

2bb0dee

Merge branch 'main' into wpo

8a0362d

Merge branch 'main' into wpo

8b19191

Merge branch 'main' into wpo

ec89e37

kashif merged commit ed9ea74 into huggingface:main Oct 8, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] Adding weighted preference optimization (WPO) #2141

[DPO] Adding weighted preference optimization (WPO) #2141

gaetanlop commented Sep 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 30, 2024

kashif commented Sep 30, 2024

gaetanlop commented Sep 30, 2024

gaetanlop commented Oct 1, 2024

qgallouedec commented Oct 1, 2024

qgallouedec commented Oct 1, 2024

gaetanlop commented Oct 3, 2024

gaetanlop commented Oct 8, 2024

[DPO] Adding weighted preference optimization (WPO) #2141

[DPO] Adding weighted preference optimization (WPO) #2141

Conversation

gaetanlop commented Sep 29, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 30, 2024

kashif commented Sep 30, 2024

gaetanlop commented Sep 30, 2024

gaetanlop commented Oct 1, 2024

qgallouedec commented Oct 1, 2024

qgallouedec commented Oct 1, 2024

gaetanlop commented Oct 3, 2024

gaetanlop commented Oct 8, 2024

gaetanlop commented Sep 29, 2024 •

edited

Loading