PrimeIntellect-ai · willccbb · Jan 25, 2026 · Jan 25, 2026 · Jan 25, 2026
diff --git a/docs/training.md b/docs/training.md
@@ -103,6 +103,8 @@ This will launch a tmux session with separate panes for the trainer, orchestrato
 
 ## Training with `vf.RLTrainer`
 
+> **Note:** `vf.RLTrainer` is intended for educational/demo purposes only and is not actively maintained. For production RL training, please use [`prime-rl`](#training-with-prime-rl) instead.
+
 If you want to hack on new training algorithms and are less concerned with maximum performance or advanced features, you can use the included `RLTrainer` (via `vf-rl`), whose core files are under 1000 lines of code and include only the most essential logic for fairly-performant async off-policy training (with a similar core algorithm as `prime-rl`).
 
 The included `RLTrainer` is a minimal, hackable training loop based on `transformers.Trainer` that supports both full-parameter finetuning and LoRA training. `RLTrainer` can be viewed as a "baby" `prime-rl` that adopts a similar default training recipe (async CISPO with one-step off-policy overlap), intended for single-node test runs with dense models. The primary files (`trainer.py` and `orchestrator.py`, located in `verifiers/rl/trainer/`) are under 1000 lines of code, and are designed to be a convenient starting point for writing your own training loop.

diff --git a/verifiers/envs/multiturn_env.py b/verifiers/envs/multiturn_env.py
@@ -138,6 +138,7 @@ async def rollout(
             state = await self.setup_state(state)
         except vf.Error as e:
             state["error"] = e
+        # checks all @vf.stop methods, runs all @vf.cleanup methods if any are True
         while not await self.is_completed(state):
             try:
                 prompt_messages = await self.get_prompt_messages(state)

diff --git a/verifiers/rl/trainer/trainer.py b/verifiers/rl/trainer/trainer.py
@@ -49,6 +49,11 @@ def __init__(
         **kwargs,
     ):
         self.logger = logging.getLogger(__name__)
+        self.logger.warning(
+            "RLTrainer is intended for educational/demo purposes only and is not actively "
+            "maintained. For production RL training, please use prime-rl instead: "
+            "https://github.com/PrimeIntellect-ai/prime-rl"
+        )
 
         # model + tokenizer
         if isinstance(model, str):