Skip to content

Latest commit

 

History

History
88 lines (60 loc) · 5.36 KB

README.md

File metadata and controls

88 lines (60 loc) · 5.36 KB

veRL: Volcano Engine Reinforcement Learning for LLM

veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs). veRL is the open-source version of HybridFlow paper.

veRL is flexible and easy to use with:

  • Easy extension of diverse RL algorithms: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.

  • Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.

  • Flexible device mapping: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.

  • Readily integration with popular HuggingFace models

veRL is fast with:

  • State-of-the-art throughput: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.

  • Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

| Documentation | Paper | Slack | Wechat |

News

Key Features

  • FSDP and Megatron-LM for training.
  • vLLM and TGI for rollout generation, SGLang support coming soon.
  • huggingface models support
  • Supervised fine-tuning
  • Reward model training
  • Reinforcement learning from human feedback with PPO
  • flash-attention integration, sequence packing
  • scales up to 70B models and hundreds of GPUs
  • experiment tracking with wandb and mlflow

Getting Started

Checkout this Jupyter Notebook to get started with PPO training with a single 24GB L4 GPU (FREE GPU quota provided by Lighting Studio)!

Quickstart:

Running an PPO example step-by-step:

Reproducible algorithm baselines:

For code explanation and advance usage (extension):

Citation

If you find the project helpful, please cite:

Publications Using veRL