env worker integration by mikasenghaas · Pull Request #1714 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-02-03T18:45:49Z

This PR integrates the new vf.EnvClient and vf.EnvServer into the PRIME-RL orchestrator -- as a ~drop-in replacement for the previous env worker. Some notes on the general design

The orchestrator still loads and owns a copy of each (train/eval) environment. However, this is mainly for data loading purposes (populating the buffer) and defining the scheduling logic. The orchestrator itself does not run any rollouts
Each loaded environment is put into "server mode" meaning that it owns a vf.EnvClient which is used internally to route request to the connected environment server. Thus, if env.env_client is present, env.run_rollout and env.run_group will execute on the server.
By default, environments are configured without an address. This will lead to the orchestrator auto-spawning one env server per-env as a subprocess sidecar and automatically connect a corresponding client. If an address is given, the orchestrator expects to find an environment server corresponding to this environment at the address. It will only create the env client but not spawn sidecar an env server. This pattern can be used to run env servers in separate containers
Because we cannot serialize a client object all calls to the environment (in server mode) pass a vf.ClientConfig instead of an actual client now. The env server creates clients automatically and caches based on the config. We use the client_idx field in the vf.ClientConfig to be able to "round-robin" clients -- basically, instead of round-robining the clients, we now round-robin the client configs. Throughout the codebase basically just the setup and types changed
Environment servers by default log to files according with log.vf_level verbosity. The logs are currently in the orchestrator run directory under /train/<env-name> for training envs and /eval/<env-name> for eval envs
I also added a new entrypoint uv run env-server which is a light-weight entrypoint to start an environment server based on our EnvConfig. This can be used to first start an env server (e.g. in container) which the orchestrator later connects to

Example

Starting an RL run with side-cared env servers works as befre

uv run rl @ examples/reverse_text/rl.toml

Optionally, you can first start a separate env server, and then let the orchestrator connect to it

uv run env-server --env.id reverse-text --env.address tcp://127.0.0.1:5000

# in separate terminal
uv run rl @ examples/reverse_text/rl.toml --orchestrator.env '[{"id": "reverse-text", "address": "tcp://127.0.0.1:5000"}]'

Manually tested features

✅ reverse-text single-turn example with validation and online evals (uv run rl @ examples/multi_reverse_text/rl.toml)
✅ alphabet-sort multi-turn LoRA example (uv run rl @ examples/alphabet_sort/rl.toml)
✅ color-codeword multi-turn VLM example (uv run rl @ configs/multimodal/rl_color_codeword.toml)
✅ elastic reverse-text with DNS discovery and elastic inference (uv run rl @ configs/elastic/rl.toml)

Notes on migration

The AsyncLimiter moved from the EnvWorker to the Scheduler which and control the number of group rollout requests per minute (this should exactly match the previous behavior)
The ServerDiscovery that lived on the env worker before got deprecated and moved onto the
Eval environments for online evals use the same pattern of remote server execution. This is opposed to the previous pattern where a single subprocesses ran all of the evals. However, we still pause the orchestrator and drain pending reqs on the scheduler before starting online evals
Somewhat unrelated to this PR, but I reuse the InferencePool abstraction for the OPD teacher clients as well which gets rid of some redundant code. There is more but I will leave that for later
We deprecate eval and synthesize (and all associated tests/ utils/ ...)

Questions/ Discussion

@samsja Is it fine to remove eval? What exactly does the watcher do that we cannot do in online eval?
@JannikSt Does this work for you for running env workers in containers? I guess you would have to parse the orchestrator config for the training and eval envs and start each env in a container and then "modify" the orchestrator config to include the addresses -- is this feasible?
@JannikSt On the env worker we had auto-restarts-- what do we need this for? would restarts not be handled by k8s? let me know how we should best handle this but the vf.EnvClient probably needs some more resilience
We currently rely on this branch from verifiers which allows each env in vf.EnvGroup (which we use for our training envs) to own a custom env client that is used iif the child env is in "server mode". Long-term we might want to depcreate the use of the env group because it doesn't actually do that much for us apart from concatenating the dataset and routing to envs by task
On the env server we currently use the vf logger (standard lib). For hosted RL it would prob be nice to have consistent logging and use PRIME-RL logger. It's not super trivial to do this nice, so I will leave this for later
It would be cool to automatically surface all env logs in the tmux session by default. One option would be to tail -F outputs/<run>/train/* for train envs
We should prob turn off any logging from verifiers on the orchestrator by default. It's rarely gives useful/new information but will make this part of another PR

Missing Features

Auto-restarts upon crash: Right now if an env server dies, the env client crashes which will crash the orchestrator
Wait for env server: Right now we only wait for 30s for an env server to become alive
Logging: We are not logging the event loop lag from the env server as of now. Doing so would prob require us to override the vf.EnvServer and inject W&B monitoring
Concurrency:
Multiple servers per env: Not planned for now.

Note

High Risk
Refactors core rollout generation to run via external vf.EnvServer processes and changes inference client plumbing/types, which can impact training stability, performance, and failure modes. Also removes standalone eval/synthesize tooling and CI integration tests, which may reduce coverage and change user workflows.

Overview
The orchestrator now runs rollouts through verifiers EnvServer sidecars (or remote servers via EnvConfig.address) by wiring each env to a vf.EnvClient, replacing the previous per-env worker subprocess implementation and moving rate limiting into the Scheduler.

Online evals are reworked to evaluate via the same env-server mechanism (no eval subprocess), with new eval_utils/vf_utils helpers, and EnvConfig gains address plus orchestrator-managed extra_env_kwargs. The PR also removes the standalone eval and synthesize entrypoints/configs and drops the GitHub Actions integration-test job, while adding a new env-server CLI entrypoint and updating docs/config examples accordingly.

^{Written by Cursor Bugbot for commit 02a29d3. This will update automatically on new commits. Configure here.}

src/prime_rl/orchestrator/env_server/env_server.py

src/prime_rl/orchestrator/orchestrator.py

CHANGELOG.md

src/prime_rl/orchestrator/orchestrator.py

install_env() calls get_logger() which requires the logger to be set up first. This was missing in env-server but present in orchestrator.

cursor · 2026-02-09T00:19:37Z

src/prime_rl/orchestrator/vf_utils.py

+    vf_logger.handlers.clear()
+    vf_logger.addHandler(InterceptHandler(prefix=prefix))
+    vf_logger.setLevel(level.upper())
+    vf_logger.propagate = False


Unused function intercept_vf_logging defined but never called

Low Severity

The function intercept_vf_logging is defined in vf_utils.py but is never called anywhere in the codebase. The grep search shows the only match is its definition. The orchestrator uses vf.setup_logging(level="CRITICAL") instead. This appears to be dead code that was added but never wired up.

The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-09T02:23:04Z

src/prime_rl/utils/client.py

        self._clients = clients
        self._admin_clients = admin_clients
        self._skip_model_check = skip_model_check
+        self._idx_to_client = {client.client_idx: client for client in clients}


Unused _idx_to_client dict never accessed

Low Severity

The _idx_to_client dict is created in StaticInferencePool.__init__ but is never read or accessed anywhere in the class or codebase. This is dead code that adds unnecessary memory allocation and computation.

mikasenghaas added 30 commits February 3, 2026 15:12

start env servers for env group

dd0733a

bump

cb243fa

working reverse-text

950244c

correctly set log levels

9497a1a

use client configs everywhere + make val work

efea59d

cycle through clients via inference pool

861de75

update model name

07e4f09

deprecate env worker

9ae1732

implement evals

cedd1be

deprecate evals+synthesize

56def32

deprecate serialization

2daeb1a

use all clients for evals

192565c

simplify config

3ed852e

fix types and use inference pool for opd

a29d5c7

bring back logging intercept

40bd4b5

Merge branch 'main' into env-worker

1e236d8

setup env client/server in prime-rl

b972c64

externalize running env server

967417c

style

e06653e

bring back rate limiter on scheduler

30542a5

revert vf branch

992278a

back to custom branch

84819be

do not double asyncio

d0df649

bring back eval

7800772

fix cpu tests

463a404

bump vf

9e6c169

add math group config

5397e58

remove stop server call

8e3137a

add logs

084885c

remove last mentions of vf.State

67ad963

mikasenghaas added 2 commits February 6, 2026 13:01

do not fail if env server not yet up

43eedb7

add elastic sanity check

da326a5

mikasenghaas requested a review from samsja February 6, 2026 13:10

cursor bot reviewed Feb 6, 2026

View reviewed changes

src/prime_rl/orchestrator/env_server/env_server.py Outdated Show resolved Hide resolved

mikasenghaas added 2 commits February 6, 2026 13:15

update docs

84c485d

use extra env kwargs consistently across orch and env server

4bee048

cursor bot reviewed Feb 6, 2026

View reviewed changes

src/prime_rl/orchestrator/orchestrator.py Outdated Show resolved Hide resolved

mikasenghaas added 2 commits February 6, 2026 13:58

update math group config

636a76c

update cfg

1bc9da7

cursor bot reviewed Feb 6, 2026

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

mikasenghaas added 4 commits February 6, 2026 15:22

use dynamic model name in final evals

8058101

add extra_env_kwargs to changelog

6fb4d08

Merge branch 'main' into env-worker

a8b87fc

resolve vf merge conflicts

8aa03be

cursor bot reviewed Feb 8, 2026

View reviewed changes

src/prime_rl/orchestrator/orchestrator.py Outdated Show resolved Hide resolved

mikasenghaas and others added 10 commits February 8, 2026 09:55

assert lora name not None

69fe2fc

fix unit tests

28d3a45

fix types

7b869b6

add hendrycks math sanity check

26714ef

bump vf

c0703ce

do not double repeat eval inputs

e58f85e

do not duplicate eval inputs

fdf03cc

lower avg@

077b90e

Initialize logger in env-server before install_env (#1743)

404cdab

install_env() calls get_logger() which requires the logger to be set up first. This was missing in env-server but present in orchestrator.

disable vf logging on orch

984046a

cursor bot reviewed Feb 9, 2026

View reviewed changes

willccbb and others added 2 commits February 8, 2026 17:58

bump vf

b3c79c0

cursor bot reviewed Feb 9, 2026

View reviewed changes

samsja merged commit 31b48b8 into main Feb 9, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

env worker integration#1714

env worker integration#1714
samsja merged 69 commits intomainfrom
env-worker

mikasenghaas commented Feb 3, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

mikasenghaas commented Feb 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

Manually tested features

Notes on migration

Questions/ Discussion

Missing Features

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Unused function intercept_vf_logging defined but never called

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Unused _idx_to_client dict never accessed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikasenghaas commented Feb 3, 2026 •

edited by cursor bot

Loading

Unused function `intercept_vf_logging` defined but never called

Unused `_idx_to_client` dict never accessed