-
Notifications
You must be signed in to change notification settings - Fork 3k
[algo, rollout, sglang] feat: Support router replay with sglang #4840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for router replay with sglang, which is useful for analyzing and debugging MoE models. The changes involve enabling the feature in the sglang server configuration and handling the routed_experts data in the agent loop and sglang server.
My review has identified a critical bug that could lead to a runtime crash due to an unhandled type for routed_experts. I've also pointed out a high-severity issue related to a local import that could cause runtime failures if an incompatible version of sglang is used. I've provided suggestions to fix both issues to improve the robustness of this new feature.
| else: | ||
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | ||
|
|
||
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | ||
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local import of extract_routed_experts_from_meta_info inside the generate method can lead to a runtime ImportError if this function is not available in the installed sglang version. This can make debugging difficult as the dependency is not explicit at the module level. It's better to handle this dependency explicitly.
Consider moving the import to the top of the file and wrapping it in a try...except block. This makes the dependency clear and allows for a more graceful failure if the required sglang version is not installed.
You can add the following at the top of the file:
try:
from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info
except ImportError:
extract_routed_experts_from_meta_info = NoneThen, you can modify the logic in generate to check if the import was successful:
| else: | |
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok | |
| ) | |
| else: | |
| if extract_routed_experts_from_meta_info is None: | |
| raise ImportError( | |
| "`extract_routed_experts_from_meta_info` is not available. " | |
| "Please check your sglang installation or version." | |
| ) | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for router replay functionality with SGLang backend, enabling the capture and replay of MoE (Mixture of Experts) routing decisions during rollout generation.
Key changes:
- Enable returning routed experts information from SGLang server when the feature is enabled
- Extract and reshape routed experts data from generation output
- Handle both numpy array and torch tensor types for routed experts in agent loop processing
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| verl/workers/rollout/sglang_rollout/async_sglang_server.py | Adds configuration to enable routed experts return in SGLang, extracts and reshapes routed experts data from generation output |
| verl/experimental/agent_loop/agent_loop.py | Adds type checking to handle routed experts as either numpy array or torch tensor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | ||
|
|
||
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | ||
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import statement is placed inside the conditional block, which is good for avoiding unnecessary imports when the feature is disabled. However, this import is from an external library (sglang) which may not have a stable API. Consider adding a try-except ImportError block around this import to provide a clearer error message if the sglang version doesn't support this function, since this is a relatively new feature.
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok | |
| try: | |
| from sglang.srt.layers.moe.routed_experts_capturer import ( | |
| extract_routed_experts_from_meta_info, | |
| ) | |
| except ImportError as e: | |
| raise ImportError( | |
| "Failed to import 'extract_routed_experts_from_meta_info' from " | |
| "'sglang.srt.layers.moe.routed_experts_capturer'. This feature " | |
| "requires a version of 'sglang' that supports routed expert " | |
| "capturing. Please upgrade 'sglang' or disable " | |
| "'enable_rollout_routing_replay' in the rollout configuration." | |
| ) from e | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, | |
| self.model_config.hf_config.num_hidden_layers, | |
| self.model_config.hf_config.num_experts_per_tok, |
| routed_experts = output.get("meta_info", {}).get("routed_experts", None) | ||
| else: | ||
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | ||
|
|
||
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | ||
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When skip_tokenizer_init is True, the code uses .get() with a default of None for routed_experts, but there's no validation to ensure the data is in the expected format if it exists. When the data exists, it should likely have the same shape requirements as in the else branch. Consider adding validation or reshaping logic for this path as well to ensure consistency.
| routed_experts = output.get("meta_info", {}).get("routed_experts", None) | |
| else: | |
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, self.model_config.hf_config.num_hidden_layers, self.model_config.hf_config.num_experts_per_tok | |
| meta_info = output.get("meta_info", {}) or {} | |
| routed_experts_raw = meta_info.get("routed_experts", None) | |
| if routed_experts_raw is not None: | |
| try: | |
| routed_experts_tensor = torch.as_tensor(routed_experts_raw) | |
| routed_experts = routed_experts_tensor.reshape( | |
| -1, | |
| self.model_config.hf_config.num_hidden_layers, | |
| self.model_config.hf_config.num_experts_per_tok, | |
| ) | |
| except (RuntimeError, ValueError) as e: | |
| logger.warning( | |
| "Failed to reshape routed_experts from meta_info with skip_tokenizer_init=True: %s", | |
| e, | |
| ) | |
| routed_experts = None | |
| else: | |
| from sglang.srt.layers.moe.routed_experts_capturer import extract_routed_experts_from_meta_info | |
| routed_experts = extract_routed_experts_from_meta_info(output).reshape( | |
| -1, | |
| self.model_config.hf_config.num_hidden_layers, | |
| self.model_config.hf_config.num_experts_per_tok, |
|
@moehanabi Thanks for you PR, could you also provide a training script and report some experiment result? |
Hi, I have updated an example and README doc. I wonder what kind of experiment result you want, for sgl-project/sglang@bed301a did not supply experiment result either and it's a change of funtion rather than performance. |
…d an example for sglang
Oh, I mean experiment result train/val reward metrics w/ and w/o router replay, not efficiency performance(MFU). |
What does this PR do?
Support router replay with sglang
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
maybe use with sgl-project/sglang#15751 if you want to set chunked_prefill_size = -1
Design & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.