Skip to content

Comments

feature(luyd): add partial rollout in training process#29

Open
AltmanD wants to merge 5 commits intoopendilab:mainfrom
AltmanD:dev-partial
Open

feature(luyd): add partial rollout in training process#29
AltmanD wants to merge 5 commits intoopendilab:mainfrom
AltmanD:dev-partial

Conversation

@AltmanD
Copy link

@AltmanD AltmanD commented Jan 22, 2026

No description provided.

@AltmanD AltmanD changed the title feature(luyd): add patrial rollout in training process feature(luyd): add partial rollout in training process Jan 22, 2026
@PaParaZz1 PaParaZz1 added the enhancement New feature or request label Jan 22, 2026
@puyuan1996 puyuan1996 mentioned this pull request Jan 23, 2026
1 task
"""
Process a batch of experiences: add to replay buffer, train, and update metrics.

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use param type style format

# Then initialize our base class
assert "processor" in kwargs and kwargs["processor"] is not None, "processor is required for SPMDPPOTrainerVL"
SPMDPPOTrainerBase.__init__(self, *args, VLM=True, **kwargs)
if getattr(self.args, 'use_partial', False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly use self.args.use_partial, which defaults to False.

This subclass adds two key features:
1. Partial rollout: only a fraction (partial_percent) of the total rollout batch is generated
in each call; the rest is kept in buffers.
2. Token‑budget regeneration: samples whose generation reaches max_token_budget are flagged
Copy link
Collaborator

@puyuan1996 puyuan1996 Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce the meanning of --partial_percent and --max_budget, add the overview of our implementaion of partial_rollout to the top of this file and the pr description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add k1.5 and mimo reference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some monitoring metrics of partial_rollout to log and wanbd in order to debugging and analysing.


@torch.no_grad()
def _regenerate_from_buffer(self, num_needed: int, **kwargs) -> dict:
"""Regenerate outputs for samples that reached token budget."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix Caching or reuse Session/Request ID mechanism (not prioritized now)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(在其他项完成之后,如果性能影响明显,可以考虑这个机制)

引入 Staleness(陈旧度)阈值机制

为避免 self.regen_buffer 中滞留的样本因策略版本过旧(Staleness 过高)而引发 Off-policy 训练不稳定性,建议设置一个陈旧度阈值。

  • 机制:当样本滞留时间超过阈值时,强制将其丢弃优先在下一轮完成生成
  • 目的:确保训练数据与当前模型策略保持一致,减少分布偏差,提升收敛稳定性。

Implement a Staleness Threshold Mechanism (not prioritized now)

To prevent off-policy instability caused by outdated samples lingering in self.regen_buffer, we recommend enforcing a staleness threshold.

  • Mechanism: If a sample's staleness exceeds the limit, it must be either discarded or prioritized for immediate completion in the next round.
  • Goal: This ensures data remains consistent with the current policy, minimizing distribution shift and improving training stability.

"""
args = self.strategy.args
is_multimodal = all_images is not None
internvl = "internvl" in self.actor.pretrain_or_model.lower() if is_multimodal else False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete internvl related, add partial_rollout functionn from the latest lightrft/trainer/fast_exp_maker.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants