feature(luyd): add partial rollout in training process by AltmanD · Pull Request #29 · opendilab/LightRFT

AltmanD · 2026-01-22T07:51:10Z

No description provided.

puyuan1996 · 2026-02-03T09:08:46Z

lightrft/trainer/spmd_ppo_trainer.py

+        """
+        Process a batch of experiences: add to replay buffer, train, and update metrics.
+
+        Args:


use param type style format

puyuan1996 · 2026-02-03T09:10:14Z

lightrft/trainer/spmd_ppo_trainer.py

        # Then initialize our base class
        assert "processor" in kwargs and kwargs["processor"] is not None, "processor is required for SPMDPPOTrainerVL"
        SPMDPPOTrainerBase.__init__(self, *args, VLM=True, **kwargs)
+        if getattr(self.args, 'use_partial', False):


You can directly use self.args.use_partial, which defaults to False.

puyuan1996 · 2026-02-04T02:42:19Z

lightrft/trainer/fast_exp_maker_partial.py

+This subclass adds two key features:
+  1. Partial rollout: only a fraction (partial_percent) of the total rollout batch is generated
+     in each call; the rest is kept in buffers.
+  2. Token‑budget regeneration: samples whose generation reaches max_token_budget are flagged


introduce the meanning of --partial_percent and --max_budget, add the overview of our implementaion of partial_rollout to the top of this file and the pr description.

add k1.5 and mimo reference

add some monitoring metrics of partial_rollout to log and wanbd in order to debugging and analysing.

puyuan1996 · 2026-02-04T02:50:21Z

lightrft/trainer/fast_exp_maker_partial.py

+
+    @torch.no_grad()
+    def _regenerate_from_buffer(self, num_needed: int, **kwargs) -> dict:
+        """Regenerate outputs for samples that reached token budget."""


Prefix Caching or reuse Session/Request ID mechanism (not prioritized now)

（在其他项完成之后，如果性能影响明显，可以考虑这个机制）

引入 Staleness（陈旧度）阈值机制

为避免 self.regen_buffer 中滞留的样本因策略版本过旧（Staleness 过高）而引发 Off-policy 训练不稳定性，建议设置一个陈旧度阈值。

机制：当样本滞留时间超过阈值时，强制将其丢弃或优先在下一轮完成生成。

目的：确保训练数据与当前模型策略保持一致，减少分布偏差，提升收敛稳定性。

Implement a Staleness Threshold Mechanism (not prioritized now)

To prevent off-policy instability caused by outdated samples lingering in self.regen_buffer, we recommend enforcing a staleness threshold.

Mechanism: If a sample's staleness exceeds the limit, it must be either discarded or prioritized for immediate completion in the next round.

Goal: This ensures data remains consistent with the current policy, minimizing distribution shift and improving training stability.

puyuan1996 · 2026-02-04T02:52:07Z

lightrft/trainer/fast_exp_maker_partial.py

+        """
+        args = self.strategy.args
+        is_multimodal = all_images is not None
+        internvl = "internvl" in self.actor.pretrain_or_model.lower() if is_multimodal else False


delete internvl related, add partial_rollout functionn from the latest lightrft/trainer/fast_exp_maker.py

AltmanD added 2 commits January 22, 2026 15:47

Migrate partial rollout feature code

5331c4e

Merge remote-tracking branch 'origin/HEAD' into dev-partial

1808bc4

AltmanD changed the title ~~feature(luyd): add patrial rollout in training process~~ feature(luyd): add partial rollout in training process Jan 22, 2026

PaParaZz1 added the enhancement New feature or request label Jan 22, 2026

puyuan1996 mentioned this pull request Jan 23, 2026

Roadmap for LightRFT v0.1.2 #28

Open

1 task

puyuan1996 reviewed Feb 3, 2026

View reviewed changes

puyuan1996 requested changes Feb 4, 2026

View reviewed changes

AltmanD added 3 commits February 4, 2026 16:47

Merge branch 'main' into dev-partial

807a8cc

fix the comment format

9e22a3d

fix for new traingin pipeline

6f3c161

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feature(luyd): add partial rollout in training process#29

feature(luyd): add partial rollout in training process#29
AltmanD wants to merge 5 commits intoopendilab:mainfrom
AltmanD:dev-partial

AltmanD commented Jan 22, 2026

Uh oh!

puyuan1996 Feb 3, 2026

Uh oh!

puyuan1996 Feb 3, 2026

Uh oh!

puyuan1996 Feb 4, 2026 •

edited

Loading

Uh oh!

puyuan1996 Feb 4, 2026

Uh oh!

puyuan1996 Feb 4, 2026

Uh oh!

puyuan1996 Feb 4, 2026

Uh oh!

puyuan1996 Feb 4, 2026

Uh oh!

puyuan1996 Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

AltmanD commented Jan 22, 2026

Uh oh!

puyuan1996 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puyuan1996 Feb 4, 2026 •

edited

Loading