Skip to content

Conversation

@MCDwyer
Copy link

@MCDwyer MCDwyer commented Nov 19, 2025

What does this PR do?

Adds PSPO (Probability Smoothing Policy Optimisation) as an alternative trust-region method to GRPOTrainer. PSPO smooths probabilities toward the behaviour policy instead of using ratio clipping.

Paper: https://arxiv.org/abs/2509.21282

Changes:

  • Added trust_region_method parameter to GRPOConfig (default: "clip")
  • Added smoothing_alpha parameter for PSPO (default: 0.1)
  • Implemented PSPO smoothing in GRPOTrainer
  • Maintains backward compatibility

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@kashif

@kashif kashif self-assigned this Nov 19, 2025
@qgallouedec
Copy link
Member

qgallouedec commented Nov 21, 2025

thanks! can you apply the style (make precommit) and add short section in the paper index documentation page

and also, it would be nice to have a test case for this.

@MCDwyer
Copy link
Author

MCDwyer commented Nov 24, 2025

I've applied the style, added the paper documentation in docs/source/gr_pspo.md, and added a test in tests/test_grpo_trainer.py. Please let me know if there is anything else I need to do?

@qgallouedec
Copy link
Member

Oh, sorry, maybe I wasn't clear. You need to add a section to this part of the documentation:

https://github.com/huggingface/trl/blob/main/docs/source/paper_index.md

@MCDwyer
Copy link
Author

MCDwyer commented Nov 25, 2025

Sorry, I had misunderstood, thank you for clarifying. I've moved the documentation to a section in the paper_index.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants