feature(wzn): add LoRA training demo for Geo3K by zunian-wan · Pull Request #41 · opendilab/LightRFT

zunian-wan · 2026-02-10T05:32:13Z

📋 Summary

Purpose:
Add a demo for Geo3k training using LoRA with FSDP and SGLang

Type of Change:

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🎨 Code refactoring (no functional changes)
⚡ Performance improvement
✅ Test addition/modification
🔧 Configuration/Build changes

🔗 Related Issues

Fixes #(issue number)
Related to #(issue number)

📝 Changes

What changed:

Why these changes:

Key implementation details:

🧪 Testing

Test Plan

Unit tests: Added/updated unit tests
Integration tests: Tested with full training pipeline
Manual testing: Describe what you tested manually

Test commands:

# Commands used to test the changes

Test environment:

Python Version:
PyTorch Version:
CUDA Version:
GPU Model:
Number of GPUs:

Test Results

Test Output

Paste test output here

Before this PR:

# Baseline metrics/behavior

After this PR:

# New metrics/behavior

📊 Performance Impact

No performance impact
Performance improved:
Performance regression:

Benchmark results (if applicable):

Baseline: X samples/sec, Y GB memory
After PR: X samples/sec, Y GB memory

📚 Documentation

Docstrings updated for new/modified functions
README.md updated (if user-facing changes)
Documentation in docs/ updated (if applicable)
Examples updated/added (if applicable)
Configuration reference updated (if new parameters added)
CHANGELOG.md updated

✅ Checklist

Code Quality

Code follows the project's style guidelines (run make format and make fcheck)
Self-review of code completed
Code is well-commented, especially in complex areas
No unnecessary debug logs or commented-out code

Compatibility

Changes are backward compatible (or breaking changes are documented)
Existing tests pass with changes
No new warnings introduced

Testing

Tested with FSDP (if applicable)
Tested with DeepSpeed (if applicable)
Tested with inference engines (vLLM/SGLang) (if applicable)
Tested on multiple GPU configurations (if applicable)

Documentation

All public APIs are documented
User-facing changes are documented
Migration guide provided (if breaking changes)

🎯 Algorithm/Model Specific (if applicable)

New Algorithm:

Algorithm implementation follows existing patterns
Algorithm is configurable via CLI arguments
Example training script provided
Algorithm documentation added to docs/source/quick_start/algorithms.md

New Model Support:

Model architecture properly integrated
Tested with representative datasets
Model-specific documentation added

💭 Additional Notes

🔍 Review Checklist for Maintainers

Code quality and style verified
Tests are adequate and passing
Documentation is complete and clear
Performance impact is acceptable
Breaking changes are properly documented
Ready to merge

BEFORE SUBMITTING, PLEASE READ:

- Implement LoRA-aware model saving in FSDPV2Strategy, supporting HF/PEFT `save_pretrained` for adapters. - Add LoRA merging/unmerging logic in BroadcastManager to ensure inference engines receive effective weights during synchronization. - Optimize checkpointing in PPOTerVL to prioritize HF adapter saving for LoRA runs. - Add `run_grpo_geo3k_lora_qwen2.5_vl_7b.sh` as a reference LoRA training script. - Improve weight mapping for SGLang to handle PEFT-wrapped module names and base layer stripping.

- Added rotation logic for HF/LoRA adapters in PPO/SPMD trainers to honor the `max_ckpt_num` parameter. - Synced the cleanup mechanism with the `save_ckpt` implementation in `FSDPV2Strategy`.

examples/gsm8k_geo3k/run_grpo_geo3k_lora_qwen2.5_vl_7b.sh

lightrft/strategy/fsdp/fsdpv2.py

lightrft/strategy/utils/broadcast_utils.py

lightrft/trainer/ppo_trainer_vl.py

…ture and improve LoRA training documentation in runing script

PaParaZz1 · 2026-02-22T10:10:05Z

examples/gsm8k_geo3k/run_grpo_geo3k_lora_qwen2.5_vl_7b.sh

+#                                                                              #
+# Main modifications for LoRA:                                                 #
+# - Parameter Efficiency: Significantly reduces VRAM usage for 7B+ models.     #
+# - Targeted Adaptation: Adapts all linear layers to maintain reasoning power.  #


maintain necessary model capacity

PaParaZz1 · 2026-02-22T10:10:54Z

examples/gsm8k_geo3k/run_grpo_geo3k_lora_qwen2.5_vl_7b.sh

+GENERATE_MAX_LEN=2048    # Max length of the generated response.
+LORA_RANK=128            # LoRA rank.
+LORA_ALPHA=256           # LoRA alpha.
+LORA_DROPOUT=0.1         # LoRA dropout rate.


use default value for this argument, we seldom modify this value

PaParaZz1 · 2026-02-22T10:11:41Z

examples/gsm8k_geo3k/run_grpo_geo3k_lora_qwen2.5_vl_7b.sh

+
+# --- Single-Node Distributed Setup ---
+# Update these if you are running in a multi-node environment.
+export MLP_WORKER_NUM=1                 # Number of nodes.


simplify this part, we don't need MLP_XXX

PaParaZz1 · 2026-02-22T10:12:18Z

examples/gsm8k_geo3k/train_colocate.py

        ema_model if args.enable_ema else actor,
        tokenizer,
-        args.save_path,
+        args.save_path + "/final_model",


use os.path.join, do we need suffix here?

PaParaZz1 · 2026-02-22T10:15:20Z

lightrft/trainer/ppo_trainer_vl.py

+                max_num = getattr(args, "max_ckpt_num", 3)
+                while True:
+                    subdirs = sorted(
+                        [(os.path.join(args.ckpt_path, d), os.path.getmtime(os.path.join(args.ckpt_path, d)))


use a temporal variable to denote args.ckpt_path

PaParaZz1 · 2026-02-22T12:07:56Z

lightrft/strategy/fsdp/fsdpv2.py

        """
-        self.print("FSDP save model is not implemented, please use offline tools to convert to huggingface model")
+        # Determine the model to save (unwrap ActorVL or similar wrappers)
+        actual_model = model.model if is_actor(model) or hasattr(model, "model") else model


can we only use hasattr(model, "model") here

PaParaZz1 · 2026-02-22T12:14:24Z

lightrft/strategy/utils/broadcast_utils.py

-                        self.inference_engine.update_weights_from_tensor(
-                            sglang_name, param.data, flush_cache=(count == num_params)
-                        )
+                    if ".lora_" in name:


Why is 'continue' used here? It would prevent the LoRA parameters from being transferred from training to inference engine.

PaParaZz1 · 2026-02-22T12:16:09Z

lightrft/strategy/utils/broadcast_utils.py

+            is_peft = hasattr(self.actor.model, "merge_adapter")
+            if is_peft:
+                self.strategy.print("Merging LoRA adapters for weight synchronization...")
+                self.actor.model.merge_adapter()


add an else branch to raise RuntimeError

PaParaZz1 · 2026-02-22T12:17:20Z

lightrft/strategy/utils/broadcast_utils.py

        """
        model = self.actor.model
-        count, num_params = 0, len(list(model.named_parameters()))
+        param_dict = dict(model.named_parameters())


use OrderedDict

PaParaZz1 · 2026-02-22T12:19:28Z

lightrft/strategy/utils/broadcast_utils.py


+            # Broadcast to engine
            if self.strategy.engine_type == "vllm":
+                vllm_name = self._map_weight_name_for_sglang(effective_name)


why sglang method here

zunian-wan added 2 commits February 9, 2026 17:46

fix(wzn): implement checkpoint rotation for LoRA training

8636b14

- Added rotation logic for HF/LoRA adapters in PPO/SPMD trainers to honor the `max_ckpt_num` parameter. - Synced the cleanup mechanism with the `save_ckpt` implementation in `FSDPV2Strategy`.

zunian-wan changed the title ~~Add~~ Add Geo3K training demo using LoRA Feb 10, 2026

zunian-wan changed the title ~~Add Geo3K training demo using LoRA~~ Add Geo3K LoRA training demo Feb 10, 2026

zunian-wan changed the title ~~Add Geo3K LoRA training demo~~ Add LoRA training demo for Geo3K Feb 10, 2026

polish(wzn): code style

8c4bc4f

puyuan1996 added the enhancement New feature or request label Feb 10, 2026

puyuan1996 changed the title ~~Add LoRA training demo for Geo3K~~ feature(wzn): add LoRA training demo for Geo3K Feb 10, 2026

puyuan1996 requested changes Feb 10, 2026

View reviewed changes

polish(wzn): add typel lint for FSDP strategy save_model method signa…

4559081

…ture and improve LoRA training documentation in runing script

PaParaZz1 requested changes Feb 22, 2026

View reviewed changes

Comments

Conversation

zunian-wan commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🔗 Related Issues

📝 Changes

What changed:

Why these changes:

Key implementation details:

🧪 Testing

Test Plan

Test Results

📊 Performance Impact

📚 Documentation

✅ Checklist

Code Quality

Compatibility

Testing

Documentation

🎯 Algorithm/Model Specific (if applicable)

💭 Additional Notes

🔍 Review Checklist for Maintainers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zunian-wan commented Feb 10, 2026 •

edited

Loading