Merged
Conversation
install_env() calls get_logger() which requires the logger to be set up first. This was missing in env-server but present in orchestrator.
| vf_logger.handlers.clear() | ||
| vf_logger.addHandler(InterceptHandler(prefix=prefix)) | ||
| vf_logger.setLevel(level.upper()) | ||
| vf_logger.propagate = False |
There was a problem hiding this comment.
Unused function intercept_vf_logging defined but never called
Low Severity
The function intercept_vf_logging is defined in vf_utils.py but is never called anywhere in the codebase. The grep search shows the only match is its definition. The orchestrator uses vf.setup_logging(level="CRITICAL") instead. This appears to be dead code that was added but never wired up.
The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| self._clients = clients | ||
| self._admin_clients = admin_clients | ||
| self._skip_model_check = skip_model_check | ||
| self._idx_to_client = {client.client_idx: client for client in clients} |
There was a problem hiding this comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


This PR integrates the new
vf.EnvClientandvf.EnvServerinto the PRIME-RL orchestrator -- as a ~drop-in replacement for the previous env worker. Some notes on the general designvf.EnvClientwhich is used internally to route request to the connected environment server. Thus, ifenv.env_clientis present,env.run_rolloutandenv.run_groupwill execute on the server.address. This will lead to the orchestrator auto-spawning one env server per-env as a subprocess sidecar and automatically connect a corresponding client. If an address is given, the orchestrator expects to find an environment server corresponding to this environment at the address. It will only create the env client but not spawn sidecar an env server. This pattern can be used to run env servers in separate containersvf.ClientConfiginstead of an actual client now. The env server creates clients automatically and caches based on the config. We use theclient_idxfield in thevf.ClientConfigto be able to "round-robin" clients -- basically, instead of round-robining the clients, we now round-robin the client configs. Throughout the codebase basically just the setup and types changedlog.vf_levelverbosity. The logs are currently in the orchestrator run directory under/train/<env-name>for training envs and/eval/<env-name>for eval envsuv run env-serverwhich is a light-weight entrypoint to start an environment server based on ourEnvConfig. This can be used to first start an env server (e.g. in container) which the orchestrator later connects toExample
Starting an RL run with side-cared env servers works as befre
Optionally, you can first start a separate env server, and then let the orchestrator connect to it
Manually tested features
reverse-textsingle-turn example with validation and online evals (uv run rl @ examples/multi_reverse_text/rl.toml)alphabet-sortmulti-turn LoRA example (uv run rl @ examples/alphabet_sort/rl.toml)color-codewordmulti-turn VLM example (uv run rl @ configs/multimodal/rl_color_codeword.toml)elasticreverse-text with DNS discovery and elastic inference (uv run rl @ configs/elastic/rl.toml)Notes on migration
AsyncLimitermoved from theEnvWorkerto theSchedulerwhich and control the number of group rollout requests per minute (this should exactly match the previous behavior)ServerDiscoverythat lived on the env worker before got deprecated and moved onto theInferencePoolabstraction for the OPD teacher clients as well which gets rid of some redundant code. There is more but I will leave that for laterevalandsynthesize(and all associated tests/ utils/ ...)Questions/ Discussion
vf.EnvClientprobably needs some more resiliencevf.EnvGroup(which we use for our training envs) to own a custom env client that is used iif the child env is in "server mode". Long-term we might want to depcreate the use of the env group because it doesn't actually do that much for us apart from concatenating the dataset and routing to envs by tasktail -F outputs/<run>/train/*for train envsMissing Features
vf.EnvServerand inject W&B monitoringNote
High Risk
Refactors core rollout generation to run via external
vf.EnvServerprocesses and changes inference client plumbing/types, which can impact training stability, performance, and failure modes. Also removes standaloneeval/synthesizetooling and CI integration tests, which may reduce coverage and change user workflows.Overview
The orchestrator now runs rollouts through
verifiersEnvServersidecars (or remote servers viaEnvConfig.address) by wiring each env to avf.EnvClient, replacing the previous per-env worker subprocess implementation and moving rate limiting into theScheduler.Online evals are reworked to evaluate via the same env-server mechanism (no eval subprocess), with new
eval_utils/vf_utilshelpers, andEnvConfiggainsaddressplus orchestrator-managedextra_env_kwargs. The PR also removes the standaloneevalandsynthesizeentrypoints/configs and drops the GitHub Actions integration-test job, while adding a newenv-serverCLI entrypoint and updating docs/config examples accordingly.Written by Cursor Bugbot for commit 02a29d3. This will update automatically on new commits. Configure here.