Skip to content

Comments

feature(sunjx): add GSPO and GMPO algorithms support#22

Open
Jiaxuan-Sun wants to merge 8 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/gmpo-gspo
Open

feature(sunjx): add GSPO and GMPO algorithms support#22
Jiaxuan-Sun wants to merge 8 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/gmpo-gspo

Conversation

@Jiaxuan-Sun
Copy link
Contributor

@Jiaxuan-Sun Jiaxuan-Sun commented Jan 9, 2026

Implement GSPO and GMPO Algorithms for Policy Optimization

This PR implements GSPO and GMPO algorithms.

GSPO Reference Implementation: verl

Usage: Set --advantage_estimator "gspo" or --advantage_estimator "gmpo" in training scripts.

⚠️ Note: GSPO implementation follows the reference code, but initial training experiments show model performance does not yet align with baseline results. Further tuning may be required.
image

@puyuan1996 puyuan1996 changed the title Feature(sunjx): Implement GSPO and GMPO Algorithms for Policy Optimization feature(sunjx): add GSPO and GMPO algorithms support Jan 9, 2026
@puyuan1996 puyuan1996 added the enhancement New feature or request label Jan 9, 2026
@puyuan1996
Copy link
Collaborator

puyuan1996 commented Jan 20, 2026

@puyuan1996 puyuan1996 mentioned this pull request Jan 23, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants