Skip to content

Commit be0b399

Browse files
authored
Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1 parent b8b0ccb commit be0b399

File tree

2 files changed

+21
-0
lines changed

2 files changed

+21
-0
lines changed

docs/source/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,14 @@ features/compatibility_matrix
100100

101101
% Details about running vLLM
102102

103+
:::{toctree}
104+
:caption: Training
105+
:maxdepth: 1
106+
107+
training/trl.md
108+
109+
:::
110+
103111
:::{toctree}
104112
:caption: Inference and Serving
105113
:maxdepth: 1

docs/source/training/trl.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Transformers Reinforcement Learning
2+
3+
Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
4+
5+
Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
6+
7+
See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
8+
9+
:::{seealso}
10+
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
11+
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
12+
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
13+
:::

0 commit comments

Comments
 (0)