Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ This will launch a tmux session with separate panes for the trainer, orchestrato

## Training with `vf.RLTrainer`

> **Note:** `vf.RLTrainer` is intended for educational/demo purposes only and is not actively maintained. For production RL training, please use [`prime-rl`](#training-with-prime-rl) instead.

If you want to hack on new training algorithms and are less concerned with maximum performance or advanced features, you can use the included `RLTrainer` (via `vf-rl`), whose core files are under 1000 lines of code and include only the most essential logic for fairly-performant async off-policy training (with a similar core algorithm as `prime-rl`).

The included `RLTrainer` is a minimal, hackable training loop based on `transformers.Trainer` that supports both full-parameter finetuning and LoRA training. `RLTrainer` can be viewed as a "baby" `prime-rl` that adopts a similar default training recipe (async CISPO with one-step off-policy overlap), intended for single-node test runs with dense models. The primary files (`trainer.py` and `orchestrator.py`, located in `verifiers/rl/trainer/`) are under 1000 lines of code, and are designed to be a convenient starting point for writing your own training loop.
Expand Down
1 change: 1 addition & 0 deletions verifiers/envs/multiturn_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ async def rollout(
state = await self.setup_state(state)
except vf.Error as e:
state["error"] = e
# checks all @vf.stop methods, runs all @vf.cleanup methods if any are True
while not await self.is_completed(state):
try:
prompt_messages = await self.get_prompt_messages(state)
Expand Down
5 changes: 5 additions & 0 deletions verifiers/rl/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@ def __init__(
**kwargs,
):
self.logger = logging.getLogger(__name__)
self.logger.warning(
"RLTrainer is intended for educational/demo purposes only and is not actively "
"maintained. For production RL training, please use prime-rl instead: "
"https://github.com/PrimeIntellect-ai/prime-rl"
)

# model + tokenizer
if isinstance(model, str):
Expand Down
Loading