refactor(sunjx): refactor loss-filter implementation by Jiaxuan-Sun · Pull Request #17 · opendilab/LightRFT

Jiaxuan-Sun · 2026-01-01T17:00:57Z

Add new lightrft/trainer/filter_weight/ module with:

metrics.py - Metrics computation layer (entropy, difficulty, staleness, etc.)
filters.py - Sample filtering layer (length, reward value, entropy, difficulty filters, etc.)
weights.py - Loss weighting layer (length, entropy, difficulty, staleness weightings, etc.)
manager.py - Unified management layer (FilterWeightManager)

Note: The dynamic sampling feature has been tested. Other components are reserved for future extension.

…eighting

lightrft/trainer/filter_weight/__init__.py

lightrft/trainer/filter_weight/filters.py

lightrft/trainer/filter_weight/__init__.py

lightrft/trainer/filter_weight/manager.py

lightrft/trainer/filter_weight/metrics.py

puyuan1996 · 2026-01-20T04:58:27Z

lightrft/strategy/strategy_base.py

+                ret = {}
+                for k in all_keys:
+                    ret[k] = self.all_reduce(data.get(k, 0.0), op)
+                return ret


Why was this added? Does it cause an error without it?

This is to prevent deadlock in distributed all-reduce operations.
After dynamic sampling, the set of keys in the status dictionary may differ across ranks (some ranks have keys like kl and ptx_loss, while others do not). The all_reduce(dict) operation calls dist.all_reduce for each key individually. If the keys or their order differ between ranks, the collective operations will be inconsistent, causing the process to hang.

puyuan1996 · 2026-01-20T05:02:39Z

lightrft/trainer/fast_exp_maker.py

-                        for exp in chunk:
-                            exp.action_mask = torch.zeros_like(exp.action_mask, dtype=torch.bool)
+            if config.dynamic_sampling and not use_dynamic_filter:
+                # Legacy dynamic sampling (only if not using filter_weight framework)


Why has the dynamic sampling logic become so complex? The previous implementation seemed much simpler/cleaner. Could you explain the reasoning behind this change?

The previous version could cause deadlocks, hence the modification.

puyuan1996 · 2026-01-20T05:09:10Z

lightrft/trainer/ppo_trainer_vl.py

+                # If no valid actions or base log-probs are empty, skip KL safely.
+                if ((experience.action_mask is not None and experience.action_mask.sum().item() == 0)
+                        or (base_action_log_probs is not None and base_action_log_probs.numel() == 0)):
+                    kl = torch.zeros_like(


Have these null-check branches actually been hit during testing? If it's null, we should probably just throw an error directly.

Yes, an error occurred where a dimension mismatch was reported due to an action_mask value of 0 or baseline logprobs being empty (entering compute_approx_kl when base_action_log_probs was empty), indicating that these branches are actually triggered in such dynamic sampling and filtering scenarios.

lightrft/trainer/fast_exp_maker.py

Jiaxuan-Sun added 2 commits December 31, 2025 20:07

refactor(sunjx): refactor loss-filter for sample filtering and loss w…

0ebcdbf

…eighting

refactor(sunjx): refactor loss-filter implementation

11e81ac

puyuan1996 requested changes Jan 4, 2026

View reviewed changes

lightrft/trainer/filter_weight/__init__.py Outdated Show resolved Hide resolved

lightrft/trainer/filter_weight/filters.py Show resolved Hide resolved

lightrft/trainer/filter_weight/__init__.py Outdated Show resolved Hide resolved

puyuan1996 reviewed Jan 4, 2026

View reviewed changes

lightrft/trainer/filter_weight/__init__.py Outdated Show resolved Hide resolved

puyuan1996 added enhancement New feature or request refactor Cleanup, formatting, or restructuring of existing code. labels Jan 4, 2026

puyuan1996 mentioned this pull request Jan 5, 2026

Roadmap for LightRFT v0.1.1 #19

Closed

Jiaxuan-Sun added 7 commits January 8, 2026 15:56

refactor(sunjx): Unify the comment style

008c90a

Merge remote-tracking branch 'opendilab/main' into refactor/loss-filter

ab61fef

refactor(sunjx): fix format/fcheck bugs

4d04e1d

feature(sunjx): fix dynamic_sampling bugs

a43ae21

Merge branch 'main' into refactor/loss-filter

d0346d0

refactor(sunjx): pass formt and fcheck

7d8dea4

refactor(sunjx): pass format and fcheck check

a659c00

puyuan1996 requested changes Jan 20, 2026

View reviewed changes

refactor(sunjx): Organize the code

97f8a92

puyuan1996 mentioned this pull request Jan 21, 2026

Roadmap for LightRFT v0.1.2 #28

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

refactor(sunjx): refactor loss-filter implementation#17

refactor(sunjx): refactor loss-filter implementation#17
Jiaxuan-Sun wants to merge 10 commits intoopendilab:mainfrom
Jiaxuan-Sun:refactor/loss-filter

Jiaxuan-Sun commented Jan 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Uh oh!

puyuan1996 Jan 20, 2026

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Uh oh!

puyuan1996 Jan 20, 2026

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Jiaxuan-Sun commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puyuan1996 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jiaxuan-Sun Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jiaxuan-Sun commented Jan 1, 2026 •

edited

Loading