Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPO] Adding weighted preference optimization (WPO) #2141

Merged
merged 13 commits into from
Oct 8, 2024

Conversation

gaetanlop
Copy link
Contributor

@gaetanlop gaetanlop commented Sep 29, 2024

What does this PR do?

Adding WPO to the DPOTrainer. The paper is in the list of accepted papers of EMNLP 2024 and introduces an elegant method to similuate on-policy data while training on off policy preference pairs. It works by prioritizing the most probable samples under the current policy during optimization.

Implementation wise, it is independent from the loss function being used as it only provides a weighting for each sample pair during loss computation. My implementation is based on the authors implemation in https://github.com/wzhouad/WPO/blob/main/scripts/run_wpo.py

Before submitting

Who can review?

@kashif

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif
Copy link
Collaborator

kashif commented Sep 30, 2024

@gaetanlop tests for visual-llm models are failing i believe due to an extra argument?

@gaetanlop
Copy link
Contributor Author

@kashif my bad, should be fixed now

@gaetanlop
Copy link
Contributor Author

Hey @kashif, I don’t think the failing test is related to the PR. Am I right?

@qgallouedec
Copy link
Member

@gaetanlop tests for visual-llm models are failing i believe due to an extra argument?

No, see #2147 (comment)

@qgallouedec
Copy link
Member

Please don't merge before #2131, it introduces some modification in the documentation, that I'd like this PR to take into account. I'll review this one in details when #2131 is merged

@gaetanlop
Copy link
Contributor Author

Looks like #2131 has been merged. I will check if there are any modifications required.

@kashif kashif added ✨ enhancement New feature or request 🏋 DPO Related to DPO labels Oct 6, 2024
@gaetanlop
Copy link
Contributor Author

Thanks for the great doc @kashif, after checking #2131, it looks like it doesn't require any updates on this PR

@kashif kashif merged commit ed9ea74 into huggingface:main Oct 8, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO ✨ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants