Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,14 @@ features/compatibility_matrix

% Details about running vLLM

:::{toctree}
:caption: Training
:maxdepth: 1

training/trl.md

:::

:::{toctree}
:caption: Inference and Serving
:maxdepth: 1
Expand Down
13 changes: 13 additions & 0 deletions docs/source/training/trl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Transformers Reinforcement Learning

Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.

Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!

See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.

:::{seealso}
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
:::