-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Past roadmap for reference: #22
Agentic RL: Environment interaction & tool support [P0]
- Integrate SandBox for code generation tasks (more mature than verl's current code sandbox)
- search / environment interaction via http / grpc
- multi-turn optimizations,
better kv cache management and streaming generations (potential inference engine dependency)[sglang] feat: Add SGLang async multi-turn rollout with tool support #1037 [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout #1138 will leave kv cache optimization to inference engines - further multi-turn rollout improvements see Multi-turn rollout & agentic RL Status & Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#131
Scaling up RL & system performance [P0]
- Ring Attention
- Ulyssess sequence parallel for VLM models, e.g Qwen2VL
- reference system tuning script for best RL throughput on different types of accelerators
- multi-node rollout (potential inference engine dependency)
- alignment loss fused kernels Feat/memory optimized loss #1212
Usability improvement
- make the current ray trainer easier to extend (without modifying verl source code or forks). Currently users can define their custom reward via command line without modifying verl source code. Ideally in the main training loop we should allow custom dataset as well https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py#L41.
- collect benchmark result on torchtitan vs megatron
- support TorchTitan nd parallelism for better usability
Latest Model & Algorithm Support
See https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html for adding models with FSDP backend
See https://verl.readthedocs.io/en/latest/advance/megatron_extension.html for adding models with Megatron backend.
- gemma3 https://github.com/volcengine/verl/pull/1613/files
- deepseek v3 - since it's large, we should start with SFT for correctness verification, optimize it with fused kernels/recomputation, before moving to RL [Project] deepseek R1 infrastructure #708 Add DeepSeek 671B GRPO example #1771
- qwen3 & qwen3-moe Adding Qwen3 and Qwen3MoE huggingface/transformers#36878
and any other popular models. - OLMo2
- Dr. GRPO [Feature Request]Support Dr. GRPO for Unbiased Optimization in RL Training #742
Component Continuous Updates
- verify ulysses sequence parallelism support works with latest version of transformers >= v4.50
- replace FSDP1 with FSDP2 [fsdp] feat: support fsdp2 training and inference in fsdp_workers #1026
- add activation offloading optimization https://github.com/volcengine/verl/pull/1220/files
dataset & benchmark
- gpqa diamond (english)
- LiveCodeBench (code)
- SWE-bench Verified (code)
- CNMO 2024 (math)
- codecontests (Code Generation)
- TACO (Code Generation)
- competition_math (Math)
Please also help provide scripts to reproduce evaluation performance of public released models.
Efficient RL / codesign [P1]
- lora support for RL, and provide convergence report [feat] Add: new feature -- LoRA support for PPO #1127
Wide Hardware Coverage
Make the experience on non-nvidia GPUs more smooth
- stable Ascend NPUs suppport, with reproducible examples and logs
- stable AMD GPUs suppport, with sglang
- AMD GPU with mcore support
Make verl easier to extend with custom train/infer engine and roles
- [single_controller][decorator] Define a
DynamicEnum
class to makeDispatch
andExecute
extensible. #1424 - [RFC] engine interface for training backends (FSDP, FSDP2, torchtitan, Megatron, Mindspore, PAI-Megatron, etc) #1371
other community requests
- retool We need Retool !! #1169
- load balance [Feature Request] Load Balancing for rollout phase #658
BearBiscuit05, vermouth1992, hiyouga, ccclyu, none0663 and 33 moreFightingZhen, 651961, rootmq and donglixp