RLMEnv: Make sub-LLM calls work for training by snimu · Pull Request #738 · PrimeIntellect-ai/verifiers

snimu · 2026-01-16T21:03:17Z

Description

Align sub‑LLM requests with interleaved rollouts; tighten local RLM worker shutdown

Context
Sub‑LLM requests were hitting the wrong endpoint when interleaved rollouts are enabled, causing vLLM 400s and token accounting issues. Also, local code execution left child processes alive on teardown.

What changed

Sub‑LLM calls now follow the same interleaved path as the main model: tokenize + /chat/completions/tokens when interleaved_rollouts=True, otherwise standard chat completions.
Sub‑LLM sampling args are normalized for token‑prompt mode to match main‑model behavior.
Local RLM worker is launched in its own process group and terminated by group on teardown to avoid orphaned children.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Aligns sub‑LLM requests with main-model interleaved flow and improves local worker robustness.

Sub‑LLM API: when interleaved_rollouts=True, tokenize via tokenize_vllm and POST to /chat/completions/tokens with normalized sampling args; otherwise use chat.completions.create. Removes logprobs fallback logic and adds message content normalization.
Execution/local: launch local worker with start_new_session=True and terminate via os.killpg(.., SIGTERM/SIGKILL) to avoid orphaned children.
Interception/typing: pass state into sub‑LLM paths, use ChatMessages types, and record sub‑LLM turns/metrics in trajectory steps.
Tests: add coverage for interleaved request path, env var export, new-session start, and process-group kill; adjust fixtures to avoid SIGTERM handler registration.

^{Written by Cursor Bugbot for commit abd8127. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

verifiers/envs/experimental/rlm_env.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

tests/test_rlm_env.py

verifiers/envs/experimental/rlm_env.py

snimu added 8 commits January 16, 2026 11:28

Align sub-LLM requests with interleaved mode

6bc7a5e

Log sub-LLM payload and repl request on errors

83a07cc

Log main model request payload for debugging

bc0e86f

Terminate local RLM worker process groups

6fd8280

Revert RLM error debug logging

4063fc1

Restore spacing after token payload

c8eb015

Update sub-LLM and local worker tests

c11f338

Stabilize RLM test teardown mocks

9bb31d4

cursor bot reviewed Jan 17, 2026

View reviewed changes

verifiers/envs/experimental/rlm_env.py Show resolved Hide resolved

snimu added 2 commits January 17, 2026 21:41

Silence SIGTERM handler in RLM tests

d506d38

Clean up local RLM sessions in tests

3f62dcf

cursor bot reviewed Jan 17, 2026

View reviewed changes

tests/test_rlm_env.py Show resolved Hide resolved

verifiers/envs/experimental/rlm_env.py Show resolved Hide resolved

snimu added 3 commits January 17, 2026 22:13

Fix RLM sub-LLM typing casts

e42b299

Fix RLM message normalization typing

59ef543

Avoid mutating extra_body in sub-LLM args

abd8127

snimu merged commit 8397d49 into main Jan 18, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

RLMEnv: Make sub-LLM calls work for training#738

RLMEnv: Make sub-LLM calls work for training#738
snimu merged 13 commits intomainfrom
sebastian/rlm-local-code-fixes-2026-01-16

snimu commented Jan 16, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

snimu commented Jan 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

snimu commented Jan 16, 2026 •

edited by cursor bot

Loading