-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[sglang] feat: Add SGLang async multi-turn rollout with tool support #1037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sglang] feat: Add SGLang async multi-turn rollout with tool support #1037
Conversation
…ate machine with function call parser
…s_mask_and_tool_config
support loss_mask and loading tool from config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just waiting for CI to complete
Great! |
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>
Hello, I used verl/examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh for training, and the grad_norm became "nan" starting from the second step and afterwards. The specific log is as follows:
The versions of my Python packages are as follows:
My CUDA version is Cuda compilation tools, release 12.4, V12.4.131. The specific execution command is:
I sincerely hope to get some help. Thank you very much! |
Thanks for sharing your settings with early nan. It appears to have some instability during the training process. We'll increase monitoring in wandb and examine it closely. |
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>
Same issue here. Any update on how this is solved? Thanks! |
After several days of investigation, we identified the issue might be caused by an excessively small log_p value. You can check the wandb log for details. The log_p value around -32 corresponds to a probability of approximately 1e-14, which is significantly lower than the typical bfloat16 epsilon of 1e-6. This discrepancy leads to numerical instability. To address this, we are constructing an SFT dataset and initializing a cold-start model to resolve the issue. |
Thank you for your prompt response. Following your suggestion, I also reviewed my training logs. I am training the qwen-7b-instruct model on a custom dataset that requires tool calling, and I have encountered a similar issue where the log_p values are very small. Here are some of the metrics from my training log:
I would like to know if constructing an SFT dataset for cold start is the only solution to this problem? Additionally, I noticed in the training logs provided in the verl multi-turn rollout documentation (https://wandb.ai/zhaochenyang20/gsm8k_async_rl/runs/1ro1r7om?nw=nwuserzhaochenyang20), which is for a 3B model trained on GSM8K, there is no issue with grad_norm becoming NaN. Why is that the difference? |
I apologize for the earlier assumption that the log_prob issue was solely based on differences in stable math/code training, as it is not the root cause. After testing with cold start SFT variants, we found that adding cold start SFT significantly increases log_prob_min and reduces some occurrences of NaN, but it does not completely resolve the issue. During this process, we identified and fixed a few bugs in the previous implementation, which seem to have resolved the NaN issue in the new Weights & Biases (wandb) logs and the new PR #1475. If you could help reproduce the results, it would be greatly appreciated. We sincerely apologize for the bugs in the earlier development release that caused any inconvenience. |
Thank you very much! After running with your new PR #1475, the NaN issue of my experiments is fixed. |
Thank you very much, but it still doesn't seem to work and doesn't solve the problem of the model training crash. 非常感谢,但好像还是不起作用,没有解决模型训练崩溃的问题。我拉取了verl最新版本的代码,没有进行任何修改,跑官方gsm8k with tool的例子,模型仍然会训崩,请问有什么解决办法。下面是我wandb日志 |
Hello, may I ask if this PR supports the use of multi-turn + tools in multimodal models? If it is supported, how should I use it? And how can I integrate my own image processing tools into it? |
We are currently addressing some issues that need to be resolved for the Qwen 2.5 VL training, which is part of our Update 2. For more details, you can refer to this link: zhaochenyang20/Awesome-ML-SYS-Tutorial#132. If you're interested in collaborating with us, feel free to add me on WeChat. My ID is swordfaith. |
…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>
A redesigned version of #917
Current Status
Develop log & Tracker
What Has Been Done
What Is WIP
Key Features will be introduced in future version
Future Plan
Discussion Thread
RFC doc will be updated soon.
Contributors & Acknowledgement