feature(sunjx): implement dynamic sampling strategy in DAPO by Jiaxuan-Sun · Pull Request #40 · opendilab/LightRFT

Jiaxuan-Sun · 2026-02-10T03:22:18Z

Implement Dynamic Sampling (DAPO) for GRPO Training

This PR implements the dynamic sampling strategy from DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) to improve GRPO training efficiency.

Key Features

Group filtering: Filters out prompt groups where all responses have the same metric value (all correct or all incorrect), as they provide no useful gradient information for relative policy optimization

Jiaxuan-Sun added 3 commits February 9, 2026 16:32

feature(sunjx): dynamic sampling

2d3651a

feature(sunjx): fix bugs

36ef5c1

feature(sunjx): pass code check

67dbd0d

puyuan1996 changed the title ~~Feature(sunjx): Implement Dynamic Sampling (DAPO) for GRPO Training~~ feature(sunjx): implement dynamic sampling strategy in DAPO Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feature(sunjx): implement dynamic sampling strategy in DAPO#40

feature(sunjx): implement dynamic sampling strategy in DAPO#40
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling

Jiaxuan-Sun commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Jiaxuan-Sun commented Feb 10, 2026

Implement Dynamic Sampling (DAPO) for GRPO Training

Key Features

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant