veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs). veRL is the open-source version of HybridFlow paper.
veRL is flexible and easy to use with:
-
Easy extension of diverse RL algorithms: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.
-
Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.
-
Flexible device mapping: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
-
Readily integration with popular HuggingFace models
veRL is fast with:
-
State-of-the-art throughput: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.
-
Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.
| Documentation | Paper | Slack | Wechat |
- [2024/12] The team presented Post-training LLMs: From Algorithms to Infrastructure at NeurIPS 2024. Slides and video available.
- [2024/08] HybridFlow (verl) is accepted to EuroSys 2025.
- FSDP and Megatron-LM for training.
- vLLM and TGI for rollout generation, SGLang support coming soon.
- huggingface models support
- Supervised fine-tuning
- Reward model training
- Reinforcement learning from human feedback with PPO
- flash-attention integration, sequence packing
- scales up to 70B models and hundreds of GPUs
- experiment tracking with wandb and mlflow
Checkout this Jupyter Notebook to get started with PPO training with a single 24GB L4 GPU (FREE GPU quota provided by Lighting Studio)!
Quickstart:
Running an PPO example step-by-step:
- Data and Reward Preparation
- Understanding the PPO Example
Reproducible algorithm baselines:
For code explanation and advance usage (extension):
- PPO Trainer and Workers
- Advance Usage and Extension
If you find the project helpful, please cite: