Skip to content

[roadmap] verl development Q2 #710

@eric-haibin-lin

Description

@eric-haibin-lin

Past roadmap for reference: #22

Agentic RL: Environment interaction & tool support [P0]

Scaling up RL & system performance [P0]

  • Ring Attention
  • Ulyssess sequence parallel for VLM models, e.g Qwen2VL
  • reference system tuning script for best RL throughput on different types of accelerators
  • multi-node rollout (potential inference engine dependency)
  • alignment loss fused kernels Feat/memory optimized loss #1212

Usability improvement

  • make the current ray trainer easier to extend (without modifying verl source code or forks). Currently users can define their custom reward via command line without modifying verl source code. Ideally in the main training loop we should allow custom dataset as well https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py#L41.
  • collect benchmark result on torchtitan vs megatron
  • support TorchTitan nd parallelism for better usability

Latest Model & Algorithm Support

See https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html for adding models with FSDP backend
See https://verl.readthedocs.io/en/latest/advance/megatron_extension.html for adding models with Megatron backend.

Component Continuous Updates

dataset & benchmark

  • gpqa diamond (english)
  • LiveCodeBench (code)
  • SWE-bench Verified (code)
  • CNMO 2024 (math)
  • codecontests (Code Generation)
  • TACO (Code Generation)
  • competition_math (Math)

Please also help provide scripts to reproduce evaluation performance of public released models.

Efficient RL / codesign [P1]

Wide Hardware Coverage

Make the experience on non-nvidia GPUs more smooth

  • stable Ascend NPUs suppport, with reproducible examples and logs
  • stable AMD GPUs suppport, with sglang
  • AMD GPU with mcore support

Make verl easier to extend with custom train/infer engine and roles

other community requests

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions