Releases: PrimeIntellect-ai/verifiers
v0.1.11.dev0
v0.1.10
Verifiers v0.1.10 Release Notes
Date: 02/10/2026
Full Changelog: v0.1.9...v0.1.10
Highlights since v0.1.9
- Expanded environment support with OpenEnv and BrowserEnv integrations, env worker plumbing, and continued improvements to
CliAgentEnv/RLMEnvsandbox reliability and customization hooks. - Upgraded evaluation ergonomics with resumed evals, improved TUI info/log presentation, better rollout/token tracking, and non-TUI overflow rendering fixes.
- Improved reliability across model/runtime boundaries with timeout hardening, safer sandbox lifecycle behavior, and richer error/metadata handling.
- Modernized workspace setup and contributor workflows (
vf-setupendpoint/config updates, GEPA config support, Prime CLI refactor, skills scaffolding, and AGENTS guidance updates). - Added opencode harbor enhancements, including TITO support, tunnel sync stop behavior, and terminal-bench task coverage.
Changes included in v0.1.10 (since v0.1.9)
Environment, rollout, and runtime improvements
- openenv integration (#829)
- Add Browser Env Integration (#732)
- resume evals (#803)
- add Client Pool (#815)
- RLM: Eager sandbox creation, conditional pip install (#834)
- RLM: Add RLMEnv sandbox hooks for safer customization (#849)
- RLM: Make FIFO IO non-blocking (#850)
CliAgentEnv: addSandboxMixin, refactorInterceptionServer(#847)- rlm: migrate sandbox executor to SandboxMixin (#875)
- env worker integration (#832)
- track vf + env version in metadata (#881)
- handle empty metrics (#855)
- move sanitize_metadata out of save_metadata (#852)
- improve env client timeouts (#872)
- Fix ty logger protocol typing in sandbox retry setup (#835)
- Fix vf-eval concurrent rollout label to use effective cap (#836)
Evaluation UX, logging, and metrics
- Add robust token usage tracking (#858)
- Tighten vf-tui info preview formatting and typing checks (#830)
- Add subtle
--debughint beneath Logs panel (#824) - Fix vf-eval non-TUI live overflow rendering (#883)
- misc logging improvs (#882)
Setup, CLI, and configuration
- vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
- Support long endpoint field names in TOML registries (#861)
- prime CLI refactor (vf) (#870)
- refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
- move rlm secrets out of vf and into research-environments (#856)
Documentation, workflows, and skills
- Compile AGENTS docs from modular assets and make guidance concrete (#857)
- skills setup (#873)
- Strengthen lab AGENTS env-development guardrails (#876)
- Clarify MCPEnv is for global read-only MCP servers (#838)
- docs: remove parser-centric guidance from environment READMEs (#839)
- docs: remove parser field from env init README template (#840)
- chore: enforce ruff formatting and improve dev tooling docs (#845)
Integrations and environment packages
- openenv: default template proj/ path and simplify prompt renderer signatures (#853)
- remove vf pin in
opencode_harbor(#844) - opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
- ci: skip terminus_harbor in test-envs (#846)
- fix math rubric timeouts (#831)
- Fix for dir resolutions (#879)
- Update browse-environments freshness and quality priorities (#884)
- Clarify agent skill handling (#886)
v0.1.10.dev5
Verifiers v0.1.10.dev5 Release Notes
Date: 02/10/2026
Full Changelog: v0.1.10.dev4...v0.1.10.dev5
Highlights since v0.1.10.dev4
- Improved environment worker reliability and metadata tracking by migrating sandbox execution internals and recording verifiers/environment versions in rollout metadata.
- Polished evaluation UX with a fix for non-TUI live overflow rendering and broader logging improvements.
- Added and refined developer-facing guidance for skills and agent workflows, including clearer skill handling and browse-environment quality priorities.
- Extended opencode harbor support with TITO support, tunnel sync stop behavior, and a terminal-bench task addition.
Incremental changes since v0.1.10.dev4
- env worker integration (#832)
- track vf + env version in metadata (#881)
- misc logging improvs (#882)
- rlm: migrate sandbox executor to SandboxMixin (#875)
- opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
- Fix vf-eval non-TUI live overflow rendering (#883)
- Fix for dir resolutions (#879)
- Update browse-environments freshness and quality priorities (#884)
- Clarify agent skill handling (#886)
v0.1.10.dev4
Verifiers v0.1.10.dev4 Release Notes
Date: 02/09/2026
Full Changelog: v0.1.10.dev3...v0.1.10.dev4
Highlights since v0.1.10.dev3
- Refactored the
vfCLI command surface around Prime CLI integration and plugin wiring. - Improved environment client timeout handling across worker and ZMQ clients, with regression coverage in timeout tests.
- Added skills setup scaffolding and related docs updates for Prime Lab setup and workflow guidance.
- Tightened AGENTS guidance for environment development guardrails in Lab-facing docs.
Incremental changes since v0.1.10.dev3
v0.1.10.dev3
Verifiers v0.1.10.dev3 Release Notes
Date: 02/08/2026
Full Changelog: v0.1.9...v0.1.10.dev3
Highlights since v0.1.9
- Added new environment capabilities, including OpenEnv integration, BrowserEnv integration, and env server support for more flexible tool and environment workflows.
- Expanded evaluation UX with eval TUI, copy mode, improved logs/debug display, rollout token usage tracking, and richer saved-output rendering for tool calls.
- Introduced and iterated on RLMEnv improvements: tool partitioning (
tools,root_tools,sub_tools), better stop/error propagation, prompt/verbosity controls, safer sandbox lifecycle handling, and new sandbox hooks for customization. - Improved reliability across execution and infrastructure paths via retries for infrastructure and model-response errors, better auth/overlong prompt handling for OpenRouter, and cleanup fixes to avoid task/sandbox leakage.
- Modernized setup and training ergonomics with
vf-setupconfig changes (endpoints.toml,configs/rl, GEPA configs), support for long TOML endpoint field names, and an optional in-repoverifiers-rlpackage split. - Hardened runtime internals with
CliAgentEnvsandbox/interception refactors, client pooling, non-blocking FIFO IO for RLM, and metadata/metrics handling fixes. - Added broader OpenEnv ecosystem support and examples (e.g.,
openenv_echo,openenv_textarena,opencode_harbor) with updated version requirements.
Incremental changes since v0.1.10.dev2
- Compile AGENTS docs from modular assets and make guidance concrete (#857)
- vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
- Support long endpoint field names in TOML registries (#861)
- Add robust token usage tracking (#858)
- move rlm secrets out of vf and into research-environments (#856)
CliAgentEnv: addSandboxMixin, refactorInterceptionServer(#847)- handle empty metrics (#855)
- move sanitize_metadata out of save_metadata (#852)
- openenv: default template proj/ path and simplify prompt renderer signatures (#853)
- refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
- add Client Pool (#815)
- chore: enforce ruff formatting and improve dev tooling docs (#845)
- RLM: Make FIFO IO non-blocking (#850)
- RLM: Add RLMEnv sandbox hooks for safer customization (#849)
- RLM: Eager sandbox creation, conditional pip install (#834)
- ci: skip terminus_harbor in test-envs (#846)
- resume evals (#803)
- remove vf pin in
opencode_harbor(#844) - fix math rubric timeouts (#831)
- docs: remove parser-centric guidance from environment READMEs (#839)
- openenv integration (#829)
- Fix ty logger protocol typing in sandbox retry setup (#835)
- docs: remove parser field from env init README template (#840)
- Clarify MCPEnv is for global read-only MCP servers (#838)
- Fix vf-eval concurrent rollout label to use effective cap (#836)
- Tighten vf-tui info preview formatting and typing checks (#830)
- Add subtle
--debughint beneath Logs panel (#824)
v0.1.10.dev2
Verifiers v0.1.10.dev1 Release Notes
Date: 02/04/2026
Full Changelog: v0.1.10.dev0...v0.1.10.dev1
Changes since v0.1.10.dev0
- info oaitools fix (#821)
- Capture stdout/stderr for live display (#819)
- track token usage in eval (#816)
- RLM: show full user message (#818)
- remove filesystem info from rlm system prompts (#817)
- RLM: add prompt verbosity parameters (#814)
- re-raise auth errs + fix overlong prompt err for openrouter (#813)
- add default sandbox_labels to rlm-secrets (#810)
- Improve vf-eval display (#809)
- Increase Sandbox Default Thread Worker Count (#807)
- Tool content validation (#806)
- env server (#799)
- Add Browser Env Integration (#732)
- adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
- verifiers: fix tool call rendering from saved outputs (#802)
- RLM: remove code jail -> simplify code (#800)
- Add sync bulk sandbox teardown for RLM env (#798)
- overhaul saving outputs (#774)
- Propagate RLM stop errors from root and sub tools (#797)
- clean up on task cancelation to avoid resource leakage (#795)
CliAgentEnv: teardown sandboxes via bulk delete (#796)- RLM: Fix trajectory collision (#786)
- lazy import datasets (#794)
- add rLLM integration to docs
- cancel outstanding tasks if one task raises in
generate(#793) - revert wiki-search
- update environments/README.md (#790)
- util for enforcing env vars are set (#789)
- return last result if retries exhausted (#782)
- warning log in RLTrainer (#783)
- RLM: Simplify code (#781)
- hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
- fix retry for invalid model response errors (#778)
- Make local RLM REPL concurrency configurable (#777)
- RLM: re-enable Sandboxes for both Python and Bash (#776)
- fix alphabet-sort
- fix alphabet-sort
- raise on empty response error (openrouter) to trigger retries (#772)
- mirror cli in toml config (#773)
- fix double save results in vf-eval (#771)
- remove log file for cli agent env (#770)
- eval --debug mode to skip Rich (#769)
- tools eval example
- Harbor examples (#766)
- Feature: Add tools metadata for eval viewer (#767)
- expose sandbox labels in
SandboxEnvandCliAgentEnv(#768) - RLM env stop condition fix (#757)
- lazy init locked chromadb instance in wiki-search (#765)
- created the rlm_secrets environment (#763)
- Move RLM system prompt into first user prompt (#764)
- gepa dep
- integrated gepa training, ui to track (#747)
- prime tunnel in cliagentenv (#746)
- RLM: make bash REPL default, keep Python REPL optional (#758)
- Sebastian/rlm file system 2026 01 20 (#756)
- eval tui (#735)
- multi-env evals config (#734)
- Add retry support for infrastructure errors in vf-eval (#750)
- RLM: tools, sub_tools, root_tools (#749)
- optional DatasetBuilder pattern (#739)
- Add copy mode to vf-tui (#745)
- RLMEnv: Make sub-LLM calls work for training (#738)
v0.1.10.dev1
Verifiers v0.1.10.dev1 Release Notes
Date: 02/04/2026
Full Changelog: v0.1.10.dev0...v0.1.10.dev1
Changes since v0.1.10.dev0
- info oaitools fix (#821)
- Capture stdout/stderr for live display (#819)
- track token usage in eval (#816)
- RLM: show full user message (#818)
- remove filesystem info from rlm system prompts (#817)
- RLM: add prompt verbosity parameters (#814)
- re-raise auth errs + fix overlong prompt err for openrouter (#813)
- add default sandbox_labels to rlm-secrets (#810)
- Improve vf-eval display (#809)
- Increase Sandbox Default Thread Worker Count (#807)
- Tool content validation (#806)
- env server (#799)
- Add Browser Env Integration (#732)
- adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
- verifiers: fix tool call rendering from saved outputs (#802)
- RLM: remove code jail -> simplify code (#800)
- Add sync bulk sandbox teardown for RLM env (#798)
- overhaul saving outputs (#774)
- Propagate RLM stop errors from root and sub tools (#797)
- clean up on task cancelation to avoid resource leakage (#795)
CliAgentEnv: teardown sandboxes via bulk delete (#796)- RLM: Fix trajectory collision (#786)
- lazy import datasets (#794)
- add rLLM integration to docs
- cancel outstanding tasks if one task raises in
generate(#793) - revert wiki-search
- update environments/README.md (#790)
- util for enforcing env vars are set (#789)
- return last result if retries exhausted (#782)
- warning log in RLTrainer (#783)
- RLM: Simplify code (#781)
- hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
- fix retry for invalid model response errors (#778)
- Make local RLM REPL concurrency configurable (#777)
- RLM: re-enable Sandboxes for both Python and Bash (#776)
- fix alphabet-sort
- fix alphabet-sort
- raise on empty response error (openrouter) to trigger retries (#772)
- mirror cli in toml config (#773)
- fix double save results in vf-eval (#771)
- remove log file for cli agent env (#770)
- eval --debug mode to skip Rich (#769)
- tools eval example
- Harbor examples (#766)
- Feature: Add tools metadata for eval viewer (#767)
- expose sandbox labels in
SandboxEnvandCliAgentEnv(#768) - RLM env stop condition fix (#757)
- lazy init locked chromadb instance in wiki-search (#765)
- created the rlm_secrets environment (#763)
- Move RLM system prompt into first user prompt (#764)
- gepa dep
- integrated gepa training, ui to track (#747)
- prime tunnel in cliagentenv (#746)
- RLM: make bash REPL default, keep Python REPL optional (#758)
- Sebastian/rlm file system 2026 01 20 (#756)
- eval tui (#735)
- multi-env evals config (#734)
- Add retry support for infrastructure errors in vf-eval (#750)
- RLM: tools, sub_tools, root_tools (#749)
- optional DatasetBuilder pattern (#739)
- Add copy mode to vf-tui (#745)
- RLMEnv: Make sub-LLM calls work for training (#738)
v0.1.10.dev0
v0.1.9.post3
Verifiers v0.1.9 Release Notes
Date: 01/08/2026
Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.
Post-release update:
- Tweaks to setup script (post1).
- Fix for exporting setup script (post0).
- Fix for gitignore section in setup script (post2).
- Tweaks to setup script and endpoint defaults (post3).
Highlights
-
RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via
llm_batch()function intercepted through HTTP proxy. See RLM paper. -
GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.
-
CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (
CliAgentEnv) and loading Harbor-format tasks (HarborEnv). -
MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.
-
Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:
MultiTurnEnv:num_turnsToolEnv:tool_call_countSandboxEnv:sandbox_call_count,sandbox_total_time_seconds,sandbox_mean_time_secondsPythonEnv:repl_call_count,repl_total_time_seconds,repl_mean_time_secondsRLMEnv: Sub-LLM metrics and more
-
Improved Error Handling: New error chain helpers, better error propagation through rollouts, and
abort_on_code_timeoutsupport for sandbox environments. -
Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.
New Features
Environments
- Add
get_sandbox_requesthook for per-rollout sandbox customization (#699) - Expose
render_completionandadd_trajectory_stepmethods with private/final guardrails (#679) - Add
final_messagespattern for cleaner message handling (#677) - Support for token-in vLLM endpoint (#626)
- Static
make_datasetfunction for environments (#683) - Add
alphabet-sortexample environment (#695) system_promptis now prepended to existing prompts that don't already start with a system message
Evaluation & Training
- Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
vf-tuiimprovements: regex search modal and run details panel (#705)- Log eventloop lag during
vf-eval(#687) - Log timings in
vf-eval(#686) - Show rolling average as tqdm postfix (#693)
- Option to bypass scoring for faster iteration (#645)
- Add
trajectory_idto TrajectoryStep (#675)
Rubrics
- Add RLM monitor rubric for sub-LLM metrics (#698)
- Improvements to math rubric with better timeout handling (#657)
- JudgeRubric now accepts optional
stateargument (#684)
Error Handling
- Helpers for error chains (#649)
- Better error handling with
abort_on_code_timeout(#659) - Handle all truncation cases (#637)
- Raise
ModelErrorwhenresponse.choicesisNone(#640) - Apply
stop_errorspattern to StatefulToolEnv for parse/call errors (#618) - Normalize messages from sub-LLM calls to prevent errors (#664)
Bug Fixes
- Fix tool duplication when calling
add_toolonToolEnvwith shared list reference - Fix
args_to_skipvalidation failure for dict type parameters inStatefulToolEnv(#674) - Fix empty slice handling (#701)
- Fix wiki-search environment (#697)
- Fix tool test environment (#692)
- Fix PythonEnv deadlock (#652)
- Fix auto-format dataset for
message_type=completions(#624) - Fix math verify timeout (#620)
- Fix sub-LLM metrics and context warnings
pip_install_packages=""no longer breaks sandbox (#633)- Remove prompt logprobs to reduce memory usage (#666)
- Warn when ignoring system prompt/few-shot with
promptpresent (#668) - Handle empty completions in
parse_answer(#672)
Infrastructure & Documentation
- Ensure integrations can be installed via full path (#704)
- Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
- Experimental folder structure for newer environments (#643)
- Overhaul docs with example configs (#700)
- Update docs for v0.1.8 API (#670)
- Add automatic docs sync workflow (#628)
- Redirect RTD to shared Mintlify docs (#654)
- Dynamic logger names (#639)
- Use threading for sandbox client (#638)
- Bump
prime-sandboxes>=2.7.0(#660)
vf-setup Command
The vf-setup command bootstraps a verifiers training workspace:
Default behavior (no flags):
- Creates
configs/andenvironments/directories - Downloads
AGENTS.md,CLAUDE.md, andenvironments/AGENTS.mdfor AI coding assistants - Downloads
configs/endpoints.py(API endpoint configuration) - Downloads lab configs for quick experimentation (
configs/lab/*.toml)
With --prime-rl:
- Installs prime-rl and syncs dependencies
- Installs all environments from
environments/into the prime-rl workspace - Downloads prime-rl-specific configs to
configs/prime-rl/
With --vf-rl:
- Downloads
configs/zero3.yaml(DeepSpeed config) - Downloads vf-rl configs to
configs/vf-rl/
With --skip-agents-md:
- Skips downloading
AGENTS.md,CLAUDE.md, andenvironments/AGENTS.md
Migration Notes
- Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for
num_turns,tool_call_count, etc., these are now provided automatically. - Third-party integrations (TextArena, ReasoningGym) have been moved to
verifiers.envs.integrations. - Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in
verifiers.envs.experimentaland require explicit imports orverifiers[all]installation.
Full Changelog: v0.1.8.post2...v0.1.9
v0.1.9.post2
Verifiers v0.1.9 Release Notes
Date: 01/08/2026
Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.
Post-release update:
- Tweaks to setup script (post1).
- Fix for exporting setup script (post0).
- Fix for gitignore section in setup script (post2).
Highlights
-
RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via
llm_batch()function intercepted through HTTP proxy. See RLM paper. -
GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.
-
CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (
CliAgentEnv) and loading Harbor-format tasks (HarborEnv). -
MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.
-
Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:
MultiTurnEnv:num_turnsToolEnv:tool_call_countSandboxEnv:sandbox_call_count,sandbox_total_time_seconds,sandbox_mean_time_secondsPythonEnv:repl_call_count,repl_total_time_seconds,repl_mean_time_secondsRLMEnv: Sub-LLM metrics and more
-
Improved Error Handling: New error chain helpers, better error propagation through rollouts, and
abort_on_code_timeoutsupport for sandbox environments. -
Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.
New Features
Environments
- Add
get_sandbox_requesthook for per-rollout sandbox customization (#699) - Expose
render_completionandadd_trajectory_stepmethods with private/final guardrails (#679) - Add
final_messagespattern for cleaner message handling (#677) - Support for token-in vLLM endpoint (#626)
- Static
make_datasetfunction for environments (#683) - Add
alphabet-sortexample environment (#695) system_promptis now prepended to existing prompts that don't already start with a system message
Evaluation & Training
- Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
vf-tuiimprovements: regex search modal and run details panel (#705)- Log eventloop lag during
vf-eval(#687) - Log timings in
vf-eval(#686) - Show rolling average as tqdm postfix (#693)
- Option to bypass scoring for faster iteration (#645)
- Add
trajectory_idto TrajectoryStep (#675)
Rubrics
- Add RLM monitor rubric for sub-LLM metrics (#698)
- Improvements to math rubric with better timeout handling (#657)
- JudgeRubric now accepts optional
stateargument (#684)
Error Handling
- Helpers for error chains (#649)
- Better error handling with
abort_on_code_timeout(#659) - Handle all truncation cases (#637)
- Raise
ModelErrorwhenresponse.choicesisNone(#640) - Apply
stop_errorspattern to StatefulToolEnv for parse/call errors (#618) - Normalize messages from sub-LLM calls to prevent errors (#664)
Bug Fixes
- Fix tool duplication when calling
add_toolonToolEnvwith shared list reference - Fix
args_to_skipvalidation failure for dict type parameters inStatefulToolEnv(#674) - Fix empty slice handling (#701)
- Fix wiki-search environment (#697)
- Fix tool test environment (#692)
- Fix PythonEnv deadlock (#652)
- Fix auto-format dataset for
message_type=completions(#624) - Fix math verify timeout (#620)
- Fix sub-LLM metrics and context warnings
pip_install_packages=""no longer breaks sandbox (#633)- Remove prompt logprobs to reduce memory usage (#666)
- Warn when ignoring system prompt/few-shot with
promptpresent (#668) - Handle empty completions in
parse_answer(#672)
Infrastructure & Documentation
- Ensure integrations can be installed via full path (#704)
- Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
- Experimental folder structure for newer environments (#643)
- Overhaul docs with example configs (#700)
- Update docs for v0.1.8 API (#670)
- Add automatic docs sync workflow (#628)
- Redirect RTD to shared Mintlify docs (#654)
- Dynamic logger names (#639)
- Use threading for sandbox client (#638)
- Bump
prime-sandboxes>=2.7.0(#660)
vf-setup Command
The vf-setup command bootstraps a verifiers training workspace:
Default behavior (no flags):
- Creates
configs/andenvironments/directories - Downloads
AGENTS.md,CLAUDE.md, andenvironments/AGENTS.mdfor AI coding assistants - Downloads
configs/endpoints.py(API endpoint configuration) - Downloads lab configs for quick experimentation (
configs/lab/*.toml)
With --prime-rl:
- Installs prime-rl and syncs dependencies
- Installs all environments from
environments/into the prime-rl workspace - Downloads prime-rl-specific configs to
configs/prime-rl/
With --vf-rl:
- Downloads
configs/zero3.yaml(DeepSpeed config) - Downloads vf-rl configs to
configs/vf-rl/
With --skip-agents-md:
- Skips downloading
AGENTS.md,CLAUDE.md, andenvironments/AGENTS.md
Migration Notes
- Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for
num_turns,tool_call_count, etc., these are now provided automatically. - Third-party integrations (TextArena, ReasoningGym) have been moved to
verifiers.envs.integrations. - Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in
verifiers.envs.experimentaland require explicit imports orverifiers[all]installation.
Full Changelog: v0.1.8.post2...v0.1.9