-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[V1] AsyncLLM
Implementation
#9826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] AsyncLLM
Implementation
#9826
Conversation
|
@robertgshaw2-neuralmagic I'm glad to see your optimized pr. I found some problems during the test and wanted to ask for advice. I set llama2-7b, 1gpu, batch=256, used V1-engine for testing and analysis, and used pr Comparing the test with your PR, the token gap is analyzed as follows: I am very happy that the new implementation has removed the token enqueue and dequeue time, but I found that the new version of update_schedule and schedule take longer. There is no major change in the total gap time |
Hey @lixiaolx - thanks for taking a look. I am having a hard time understanding your analysis - could you clarify? |
Thanks @lixiaolx, nice profiles! What you observe is not unexpected since the scheduling logic currently contends for the GIL with the IPC message serialization/deserialization. Our intention is to improve this very soon but doing the IPC work in a separate thread is still a big win as a first step since much of that work overlaps with parts of the critical loop that don't contend for the GIL, primarily the forward pass in the GPU. |
Thank you very much for your answer. I tried to compare this solution. If we solve the GIL problem, the remaining gap time will be 2-3ms according to the above calculation. |
|
@robertgshaw2-neuralmagic @njhill Hello, does our pr support multiple gpu cards? Well, when testing llama2-70b 8gpu,occurs server log was stuck here. |
@lixiaolx the V1 path is still in an alpha state and does not yet support multiple GPUs, but will do soon. |
@njhill ,Is there any arrangement for this asynchronous scheduling? |
OK,thank you |
Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
Not yet, our plan is to first optimize other aspects first since it will be complex to combine this with certain other optimizations. |
Sorry to bother you, but I’d like to ask how you added nvtx to analyze the time overhead of these function calls? |
Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
…#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com>
…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com>
* clean codes (#1219) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> * Update the ray debug tutorial (#1204) ## Motivation The existing Ray tutorial is difficult to follow and doesn’t explain how to debug across multiple breakpoints. ## Modifications - Updated `multinode.rst` ## Checklist - [x] Created independent `ray_debugger.rst` with step‑by‑step instructions * fix util reward_score/math_dapo.py notes. (#1185) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> * fixt: typo (#1217) Alternatively, we should properly expand on the role of the parameter `mapping` * docker: update Dockerfile.sglang (#1207) Install ray[default] to include missing components * Update ray_debug_tutorial.rst (#1228) * [vllm] update moe patch for megatron and fsdp (#1200) ## Motivation This is a fix for the issue where the `weight_loader` in FusedMoe of the vLLM code could not be used correctly during the resharding phase, addressed in #923, #1137, and #1139 respectively. Currently, the results of these PRs can be used together, allow both FSDP and Megatron to use the same function, reducing code maintenance costs. * [mcore] refactor: remove the mcore patches (#1229) * Fix docs about config page. (#1236) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> * Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237) As support both, we let TE to choose attention backend now. New Image: `whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2` * fix: validation top_p=0.7 for DAPO full (#1241) * [misc] refactor moe bash (#1245) * [logging] feat: Add Rollout and Validation dumps to file (#916) Co-authored-by: Mert Unsal <mertunsal1905@gmail.com> * [AMD] Add AMD performance tuning documentation (#1240) * [logging] feat: Add step and epoch metrics (#1250) Solves #1251 Right now the current global step and current epoch are not being logged. This would be a useful feature. * [SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203) The ipv6 part is picked from https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05 --------- Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com> Co-authored-by: Gelee-Q <leege233@gmail.com> * [proto] feat: Add bool-type index selection for DataProto (#1082) After the last change, current DataProto cannot use bool-type index due to hard-coded batch_size equal to idxs.shape[0]. This patch changes the new batch_size for bool-type idx to idxs.sum(). It's useful when users filter the batch with bool-type masks. * [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](https://github.com/vllm-project/vllm/pull/9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com> * [AMD] Update AMD performance tuning documentation (#1256) Update AMD performance tuning documentation according to @yushengsu-thu's suggestion. 1. fix git branch and link 2. fix tab * fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254) deprecated remove_previous_ckpt key cause save checkpoint crash. See: https://github.com/volcengine/verl/issues/1183 * fix: Correct sampling params setting in sglang evaluation (#1181) This PR fixes an issue where parameters in `val_kwargs` are not effectively passed during sglang evaluation when `do_sample=True` is set. Additionally, since the validation data has already been repeated in `ray_trainer`, the `n` parameter in `sampling_params` needs to be correctly configured to prevent errors caused by dimension mismatches. * distro: clean req packages. (#1253) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> * [rollout] feat: support rollout.n > 1 in hf_rollout (#1199) Currently, the hf rollout backend only support `rollout.n == 1`, when `rollout.n > 1` it will lead to an error (https://github.com/volcengine/verl/issues/1134) This PR make hf rollout support `do_sample` and `is_validate` to make it consistent with vllm and sglang backend, and correctly support `rollout.n > 1`. * [bugfix] fix: add `await` for `_validate()` (#1269) As titled. * [profile] add profile for megatron train (#1146) ## Motivation This is a new feature that adds the functionality of collecting profiles during the training phase. Since the RL process repeatedly enters the training process, by default, the profile temporarily captures the results of the first `update_policy`. Moreover, this modification should be seamlessly integrated into other training frameworks. * [mcore] add offload param and opt function for magetron (#1162) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing * [CI] feat: only test for push to main (#1271) * [misc] add offload and profile doc, add validate in profile (#1272) * Adding GUI-R1 to the Awesome work (#1275) * feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274) Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making PPOTrainer async class. * [profile] print cuda system memory and offload actor model after init (#1118) Co-authored-by: hiyouga <hiyouga@buaa.edu.cn> * [Lint] fix: linting errors in all files (#1280) This PR enables checking on all files after fixing all the errors: ``` examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120) examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120) examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120) examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120) examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120) examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120) examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120) examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120) recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120) recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120) recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120) recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120) recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120) recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120) recipe/r1/data_process.py:131:9: E722 Do not use bare `except` recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except` scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found scripts/model_merger.py:42:121: E501 Line too long (184 > 120) scripts/model_merger.py:146:13: E722 Do not use bare `except` tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120) tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120) tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20 tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120) tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18 tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120) tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120) tests/sanity/check_license.py:22:1: E402 Module level import not at top of file tests/sanity/check_license.py:23:1: E402 Module level import not at top of file tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/__init__.py:22:1: E402 Module level import not at top of file verl/__init__.py:24:1: E402 Module level import not at top of file verl/__init__.py:25:1: E402 Module level import not at top of file verl/__init__.py:29:1: E402 Module level import not at top of file verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120) verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120) verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120) verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120) verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120) verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120) verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120) verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120) verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120) verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120) verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120) verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120) verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120) verl/protocol.py:303:121: E501 Line too long (125 > 120) verl/protocol.py:352:121: E501 Line too long (171 > 120) verl/protocol.py:578:121: E501 Line too long (142 > 120) verl/protocol.py:580:121: E501 Line too long (150 > 120) verl/protocol.py:583:121: E501 Line too long (167 > 120) verl/protocol.py:715:1: E402 Module level import not at top of file verl/protocol.py:725:121: E501 Line too long (121 > 120) verl/protocol.py:766:1: E402 Module level import not at top of file verl/protocol.py:768:1: E402 Module level import not at top of file verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120) verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120) verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+` verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120) verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120) verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120) verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29 verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120) verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120) verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120) verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82 verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120) verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120) verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120) verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key` verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key` verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120) verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120) verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120) verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120) verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120) verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120) verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/model.py:464:121: E501 Line too long (124 > 120) verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120) verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120) verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120) verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120) verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120) verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120) verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120) verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120) verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120) verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120) verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120) verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120) verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120) verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l` verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120) verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120) verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120) verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120) verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120) verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120) verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120) verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120) verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120) verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120) verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120) verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120) verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120) verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used Found 305 errors. ``` --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com> * [logging] fix: typo of fsdp_checkpoint_manager saving optim path (#1276) fix a minor typo of printing optim saving path in fsdp_checkpoint_manager.py * [doc] fix: fix 2 minor issues in installation and reward explanation (#1215) close - #1214 - #1213 Co-authored-by: HL <linhaibin.eric@gmail.com> * [merger] fix: merged generation config is inconsistent with hf pre-trained model (#1277) https://github.com/volcengine/verl/blob/afeac9a0230a0980e990a3c59e08e8e0890baaa4/scripts/model_merger.py#L195-L200 Model created by `from_config` won't load the `generation_config.json` from `args.hf_model_path`, instead it create a generation config separately. This inconsistency will lead to strange generating error when user using vllm/hf rollout without carefully override sampling_params/generation_config, see issue here: https://github.com/volcengine/verl/issues/1246 This PR introduce a `patch_model_generation_config` function which patch the model from config to correctly use the pretrained generation config. Fix https://github.com/volcengine/verl/issues/1246. * Option to make model private when pushing to hub, pushing the tokenizer for convenience (#1259) Very small changes to `model_merger.py` so that tokenizer is pushed to hub and model can be pushed privately. * [CI] feat: only check changed files (#1294) * [example] chore: remove verl_getting_started.ipynb (#1281) remove the out-dated notebook * [doc] add the multi modal doc (#1292) ## Motivation There is currently no docs support for multimodal task on verl, so I think we need to add a related document. * docs: add DeepWiki and ICLR links (#1283) * [docs] add pr template (#1287) # What does this PR do? add the PR template to improve the readability of PR. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [ ] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if neccessary? Please add CI tests to your new feature. * fix: catch any error in math reward function (#1312) # What does this PR do? This PR fixes collapse in the math reward function by catch any possible errors. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - **Issue Number**: None - **Training**: None - **Inference**: None * [vllm] add moe patch for qwen3-moe (#1316) # What does this PR do? Add moe patch for qwen3-moe. Fix the weight loader issue in vLLM MoE models. This isn’t a permanent solution, and we may need to contribute code to vLLM to address the problem caused by FusedMoE. I’m already seeking suggestions for this. # ChangeLog: - Add Qwen3MoeForCausalLM class for moe_patch * fix reward model and add CI test (#1252) Fix bugs related to #1165 . Megatron backend reward model has no CI test, add to current ppo trainer. Fix `micro_batch_size_per_gpu` but not sure whether it is right for reward config. The output format is also not right with current `forward_micro_batch` implementation. * [sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037) A redesigned version of #917 ## Current Status [Develop log & Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com> * [fix] Remove grad_offload in rloo example script (#1323) # What does this PR do? `grad_offload` option was removed in #284 for fsdp backend, current script will error out due to this. # ChangeLog: - Remove grad_offload in rloo example script # Usage - Run the changed script ## Before submitting - [X] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [X] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [X] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - **Issue Number**: N/A - **Training**: FSDP - **Inference**: None Signed-off-by: Hollow Man <hollowman@opensuse.org> * cancel bootstrapping for n=n_samples (#1320) # What does this PR do? The validation metrics currently bootstraps its estimates by randomly sampling 1,2,4,8,16,...,n_samples results out of n_samples results. However, this bootstrapping doesn't make sense for `n=n_samples` as you cannot have more information about the estimate for `pass@n_samples` if you only have `n_samples` samples. This results in weird results when doing RL with only one problem in the validation set (best@N is a value between 0 and 1 instead of 0 or 1) This PR turns off bootstrapping for n=n_samples case and leaves rest of the computations the same. * docs: add community blogs and fix link rendering (#1324) # What does this PR do? Add one-line overview of what this PR aims to achieve or accomplish. # ChangeLog: - Add two reference blogs to README # Usage None ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [] Did you write any test cases if neccessary? No tests needed * [doc] fix dataset path for gsm8k and url error (#1327) # What does this PR do? fix dataset path for gsm8k and some url error. # ChangeLog: change the readme file to fix gsm8k download path. # Usage - You can add one use example below. ```python # Add code snippet or script demonstrating how to use this ``` - For algorithm implementation and new model support, you can add training curve plots and evaluatuion results below. ## Before submitting - [ ] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [ ] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] * [feat] add FusedWorker (#1278) on behalf of @zw0610 FusedWorker is designed to enhance the ability of colocated workers. FusedWorker keeps most of the interfaces as colocated workers: Users shall use `create_colocated_worker_cls_fused` to create colocated worker class, use `spawn` to split FusedWorker to dict of workers. In colocated workers, access the methods of child workers is done by using `spawn` then access via worker dict or calling `{worker_group}.{worker}_{method}`. In FusedWorker, the first method was preserved, while the latter was change to a new way: First use `{worker_group}.fuse(prefixes)` to bind workers to the worker group, then use `{worker_group}.{worker}.foo()` to access child workers. * [test] fix: test arithmetic_sequence failed to run (#1333) # What does this PR do? e2e test `arithmetic_sequence` is currently broken, with error `TypeError: not a string` thrown on code `tokenizer = AutoTokenizer.from_pretrained(local_path)` when running `tests/e2e/run_ray_trainer.sh`. This PR aims to fix it. In the `arithmetic_sequence` task, `tests.e2e.envs.digit_completion` module was imported in the beginning but not used. This import seems meaningless. However, when this library is imported, `AutoTokenizer.register()` will be called to set configurations for `AutoTokenizer`. Only after that can `AutoTokenizer` be successfully initialized in test code to perform subsequent tasks. ## Timeline - In #934 , to improve CI efficiency, the CI corresponding to `arithmetic_sequence` was removed. - In #1010 , according to the `unused_import` rule, this import was deleted, triggering the bug. # ChangeLog - `AutoTokenizer.register` was added explicitly, which ensures the configurations were set before initialization of `AutoTokenizer`. # Usage - the original code `tests/e2e/run_ray_trainer.sh` is available for tests. ```python bash tests/e2e/run_ray_trainer.sh ``` ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - **Issue Number**: none - **Training**: none - **Inference**: none * [FIX] metric_utils log best, worst, maj only for n_resps > 1 (#1248) Solves #1249 Instead of logging best@1/mean and worst@1/mean, which is identical to mean@1, just do not log it when there is only one validation response per prompt (`n_resps == 1`). Same applies to std. Otherwise we get many duplicated plots that show the same thing. The only change is the addition of the `if n_resps > 1:` statement. * [dev] feat: improve PR template (#1343) This PR tries to imporve the PR template itself. * [recipe] feat: latest reproduction of DAPO (#1336) # What does this PR do? This PR updates the latest reproduction results of DAPO. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - **Issue Number**: none - **Training**: none - **Inference**: none * [docs] fix: typo (#1351) * [installation] doc: Fix pip install instructions (#1353) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? There should be no space between `.` and `[vllm]` or `[sglang]`, or it will result in error: ```logs ERROR: Invalid requirement: '[vllm]': Expected package name at the start of dependency specifier [vllm] ``` In addition, I rewrite this part to make the instructions more clear (as `.. or ..` can't be executed by bash directly) ### Additional Info. - **Issue Number**: none - **Training**: none - **Inference**: none ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if neccessary. Signed-off-by: Hollow Man <hollowman@opensuse.org> * [fsdp] feat: support fsdp2 training and inference in fsdp_workers (#1026) # What does this PR do? This PR supports fsdp2 for fsdp_worker. Torch version 2.4 or higher is required. # Usage Example ``` sh examples/grpo_trainer/run_qwen2-7b.sh \ actor_rollout_ref.ref.strategy=fsdp2 \ actor_rollout_ref.actor.strategy=fsdp2 ``` To save more memory, you can add the parameter below to enable the fsdp2 OffloadPolicy: ``` actor_rollout_ref.actor.offload_policy=True ``` You can see the profile comparison between fsdp1 and fsdp2 here: https://github.com/volcengine/verl/pull/1026#issuecomment-2824343860 --------- Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com> Co-authored-by: shengguangming <shengguangming@bytedance.com> * [docs] fix: Fix Arxiv Link (#1364) Arxiv link is not rendering on github or https://verl.readthedocs.io/en/latest/index.html# ### Checklist Before Starting - [x ] Search for similar PR(s). ### What does this PR do? Makes external link to arxiv paper resolve properly. ### High-Level Design N/A ### Specific Changes Single line doc change ### API N/A ### Usage Example N/A ### Test N/A ### Additional Info. ### Checklist Before Submitting All N/A * [dataproto] feat: Add auto padding for DataProto (#1356) ### Checklist Before Starting - [x] Search for similar PR(s). Coming from #577 , credit to @zw0610 ### What does this PR do? Today, users must manually duplicate (repeat) a DataProto so its batch size matches the data‑parallel (dp) size of the target WorkerGroup. This PR enables `auto_padding` to pad the `DataProto` when chunk is called. ### Specific Changes * Enriched the `DataProto` so that it can have context of padding during chunking; * Modified the `decorator.py` that a DataProto can be automatically padded and chunked with `dispatch_dp_compute_data_proto`; * Added unit tests under `tests/ray/test_auto_padding.py`. ### API Two new API under `DataProto` are introduced, which are `padding` and `is_padding_enabled` ### Test Tests added to `tests/ray/test_auto_padding.py` ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: Wang Zhang <zhangwang.nozomi@bytedance.com> Co-authored-by: Wang Zhang <zw199006@gmail.com> * [ray] feat: Making decorator register available for async function (#1370) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR enables the decorators to be able to be applied onto async functions. ### High-Level Design * Simply added a inner wrapper function available for async func inside the `register` function. ### Usage Example ```python @register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False) async def async_fn(self, sleep_time): return await asyncio.sleep(sleep_time * 0.1) ``` ### Test * `tests/ray/test_decorator.py` ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> * docs: Add runllm widget for VeRL Doc sites (#1366) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Add runllm widget for https://app.readthedocs.org/projects/verl/ ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. * [trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282) * [ci][fix] Enable part of ray test to be run on CPU machine (#1372) * [fix][ci] fix two pipelines that fails on the main branch (#1378) * [feat] Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers (#1379) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? * Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers * Added a test pipeline for all the tests under `tests/utils`, Any future unit tests for `verl/utils` should be added here * Re-organized the tests file structure. ### Usage Example For the new `update_model_config`, an example looks like below: ```python override_config_kwargs = { "bos_token_id": self.tokenizer.bos_token_id, ... "nested_config": {k1: v1, k2, v2}, } update_model_config(actor_model_config, override_config_kwargs=override_config_kwargs) ``` ### Test Added `tests/verl/utils/test_model.py::test_update_model_config` ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> * [rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297) Co-authored-by: shengguangming <shengguangming@bytedance.com> * [dev] fix: validation metrics (#1374) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Fix the error that `metric` is not added when `n == 1`. 2. Remove `std@1`. 3. Add assertation for doing initial validation but `val_metrics` is empty. ### Additional Info. - **Issue Number**: none - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. * [sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default https://github.com/sgl-project/sglang/pull/5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch #1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title …
…volcengine#1138) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and volcengine#385 volcengine#398 volcengine#710 ### Architecture  **New Components**: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](vllm-project/vllm#9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com>
SUMMARY:
AsyncLLM
in V1 - better overlapping of GPU and CPUTODO:
io threads
FOLLOW UP PRS:
AsyncLLM
andLLMEngine
tests (abort, stop string, other unit)LLM
by default (need to figure out a way around fork) - currently, need to setVLLM_ENABLE_V1_MULTIPROCESSING=1
DIAGRAM:
EngineCoreClient
class that is used by theAsyncLLM
to interact with theEngineCore
, but the overall architecture is close to what we have.output_handler_loop
toEngineCore