Skip to content

Comments

feature(sunjx): implement dynamic sampling strategy in DAPO#40

Open
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling
Open

feature(sunjx): implement dynamic sampling strategy in DAPO#40
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling

Conversation

@Jiaxuan-Sun
Copy link
Contributor

Implement Dynamic Sampling (DAPO) for GRPO Training

This PR implements the dynamic sampling strategy from DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) to improve GRPO training efficiency.

Key Features

  • Group filtering: Filters out prompt groups where all responses have the same metric value (all correct or all incorrect), as they provide no useful gradient information for relative policy optimization
image

@puyuan1996 puyuan1996 changed the title Feature(sunjx): Implement Dynamic Sampling (DAPO) for GRPO Training feature(sunjx): implement dynamic sampling strategy in DAPO Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant