Skip to content

Releases: PrimeIntellect-ai/verifiers

v0.1.11.dev0

14 Feb 11:34
0f1e334

Choose a tag to compare

Verifiers v0.1.11.dev0 Release Notes

Date: 02/14/2026

Full Changelog: v0.1.10...v0.1.11.dev0

v0.1.10

11 Feb 00:33
ccef044

Choose a tag to compare

Verifiers v0.1.10 Release Notes

Date: 02/10/2026

Full Changelog: v0.1.9...v0.1.10

Highlights since v0.1.9

  • Expanded environment support with OpenEnv and BrowserEnv integrations, env worker plumbing, and continued improvements to CliAgentEnv/RLMEnv sandbox reliability and customization hooks.
  • Upgraded evaluation ergonomics with resumed evals, improved TUI info/log presentation, better rollout/token tracking, and non-TUI overflow rendering fixes.
  • Improved reliability across model/runtime boundaries with timeout hardening, safer sandbox lifecycle behavior, and richer error/metadata handling.
  • Modernized workspace setup and contributor workflows (vf-setup endpoint/config updates, GEPA config support, Prime CLI refactor, skills scaffolding, and AGENTS guidance updates).
  • Added opencode harbor enhancements, including TITO support, tunnel sync stop behavior, and terminal-bench task coverage.

Changes included in v0.1.10 (since v0.1.9)

Environment, rollout, and runtime improvements

  • openenv integration (#829)
  • Add Browser Env Integration (#732)
  • resume evals (#803)
  • add Client Pool (#815)
  • RLM: Eager sandbox creation, conditional pip install (#834)
  • RLM: Add RLMEnv sandbox hooks for safer customization (#849)
  • RLM: Make FIFO IO non-blocking (#850)
  • CliAgentEnv: add SandboxMixin, refactor InterceptionServer (#847)
  • rlm: migrate sandbox executor to SandboxMixin (#875)
  • env worker integration (#832)
  • track vf + env version in metadata (#881)
  • handle empty metrics (#855)
  • move sanitize_metadata out of save_metadata (#852)
  • improve env client timeouts (#872)
  • Fix ty logger protocol typing in sandbox retry setup (#835)
  • Fix vf-eval concurrent rollout label to use effective cap (#836)

Evaluation UX, logging, and metrics

  • Add robust token usage tracking (#858)
  • Tighten vf-tui info preview formatting and typing checks (#830)
  • Add subtle --debug hint beneath Logs panel (#824)
  • Fix vf-eval non-TUI live overflow rendering (#883)
  • misc logging improvs (#882)

Setup, CLI, and configuration

  • vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
  • Support long endpoint field names in TOML registries (#861)
  • prime CLI refactor (vf) (#870)
  • refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
  • move rlm secrets out of vf and into research-environments (#856)

Documentation, workflows, and skills

  • Compile AGENTS docs from modular assets and make guidance concrete (#857)
  • skills setup (#873)
  • Strengthen lab AGENTS env-development guardrails (#876)
  • Clarify MCPEnv is for global read-only MCP servers (#838)
  • docs: remove parser-centric guidance from environment READMEs (#839)
  • docs: remove parser field from env init README template (#840)
  • chore: enforce ruff formatting and improve dev tooling docs (#845)

Integrations and environment packages

  • openenv: default template proj/ path and simplify prompt renderer signatures (#853)
  • remove vf pin in opencode_harbor (#844)
  • opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
  • ci: skip terminus_harbor in test-envs (#846)
  • fix math rubric timeouts (#831)
  • Fix for dir resolutions (#879)
  • Update browse-environments freshness and quality priorities (#884)
  • Clarify agent skill handling (#886)

v0.1.10.dev5

10 Feb 07:46
0df300b

Choose a tag to compare

Verifiers v0.1.10.dev5 Release Notes

Date: 02/10/2026

Full Changelog: v0.1.10.dev4...v0.1.10.dev5

Highlights since v0.1.10.dev4

  • Improved environment worker reliability and metadata tracking by migrating sandbox execution internals and recording verifiers/environment versions in rollout metadata.
  • Polished evaluation UX with a fix for non-TUI live overflow rendering and broader logging improvements.
  • Added and refined developer-facing guidance for skills and agent workflows, including clearer skill handling and browse-environment quality priorities.
  • Extended opencode harbor support with TITO support, tunnel sync stop behavior, and a terminal-bench task addition.

Incremental changes since v0.1.10.dev4

  • env worker integration (#832)
  • track vf + env version in metadata (#881)
  • misc logging improvs (#882)
  • rlm: migrate sandbox executor to SandboxMixin (#875)
  • opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
  • Fix vf-eval non-TUI live overflow rendering (#883)
  • Fix for dir resolutions (#879)
  • Update browse-environments freshness and quality priorities (#884)
  • Clarify agent skill handling (#886)

v0.1.10.dev4

09 Feb 02:53
56676d6

Choose a tag to compare

Verifiers v0.1.10.dev4 Release Notes

Date: 02/09/2026

Full Changelog: v0.1.10.dev3...v0.1.10.dev4

Highlights since v0.1.10.dev3

  • Refactored the vf CLI command surface around Prime CLI integration and plugin wiring.
  • Improved environment client timeout handling across worker and ZMQ clients, with regression coverage in timeout tests.
  • Added skills setup scaffolding and related docs updates for Prime Lab setup and workflow guidance.
  • Tightened AGENTS guidance for environment development guardrails in Lab-facing docs.

Incremental changes since v0.1.10.dev3

  • improve env client timeouts (#872)
  • skills setup (#873)
  • Strengthen lab AGENTS env-development guardrails (#876)
  • prime CLI refactor (vf) (#870)

v0.1.10.dev3

08 Feb 11:50
b8b502c

Choose a tag to compare

Verifiers v0.1.10.dev3 Release Notes

Date: 02/08/2026

Full Changelog: v0.1.9...v0.1.10.dev3

Highlights since v0.1.9

  • Added new environment capabilities, including OpenEnv integration, BrowserEnv integration, and env server support for more flexible tool and environment workflows.
  • Expanded evaluation UX with eval TUI, copy mode, improved logs/debug display, rollout token usage tracking, and richer saved-output rendering for tool calls.
  • Introduced and iterated on RLMEnv improvements: tool partitioning (tools, root_tools, sub_tools), better stop/error propagation, prompt/verbosity controls, safer sandbox lifecycle handling, and new sandbox hooks for customization.
  • Improved reliability across execution and infrastructure paths via retries for infrastructure and model-response errors, better auth/overlong prompt handling for OpenRouter, and cleanup fixes to avoid task/sandbox leakage.
  • Modernized setup and training ergonomics with vf-setup config changes (endpoints.toml, configs/rl, GEPA configs), support for long TOML endpoint field names, and an optional in-repo verifiers-rl package split.
  • Hardened runtime internals with CliAgentEnv sandbox/interception refactors, client pooling, non-blocking FIFO IO for RLM, and metadata/metrics handling fixes.
  • Added broader OpenEnv ecosystem support and examples (e.g., openenv_echo, openenv_textarena, opencode_harbor) with updated version requirements.

Incremental changes since v0.1.10.dev2

  • Compile AGENTS docs from modular assets and make guidance concrete (#857)
  • vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
  • Support long endpoint field names in TOML registries (#861)
  • Add robust token usage tracking (#858)
  • move rlm secrets out of vf and into research-environments (#856)
  • CliAgentEnv: add SandboxMixin, refactor InterceptionServer (#847)
  • handle empty metrics (#855)
  • move sanitize_metadata out of save_metadata (#852)
  • openenv: default template proj/ path and simplify prompt renderer signatures (#853)
  • refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
  • add Client Pool (#815)
  • chore: enforce ruff formatting and improve dev tooling docs (#845)
  • RLM: Make FIFO IO non-blocking (#850)
  • RLM: Add RLMEnv sandbox hooks for safer customization (#849)
  • RLM: Eager sandbox creation, conditional pip install (#834)
  • ci: skip terminus_harbor in test-envs (#846)
  • resume evals (#803)
  • remove vf pin in opencode_harbor (#844)
  • fix math rubric timeouts (#831)
  • docs: remove parser-centric guidance from environment READMEs (#839)
  • openenv integration (#829)
  • Fix ty logger protocol typing in sandbox retry setup (#835)
  • docs: remove parser field from env init README template (#840)
  • Clarify MCPEnv is for global read-only MCP servers (#838)
  • Fix vf-eval concurrent rollout label to use effective cap (#836)
  • Tighten vf-tui info preview formatting and typing checks (#830)
  • Add subtle --debug hint beneath Logs panel (#824)

v0.1.10.dev2

04 Feb 10:01

Choose a tag to compare

Verifiers v0.1.10.dev1 Release Notes

Date: 02/04/2026

Full Changelog: v0.1.10.dev0...v0.1.10.dev1

Changes since v0.1.10.dev0

  • info oaitools fix (#821)
  • Capture stdout/stderr for live display (#819)
  • track token usage in eval (#816)
  • RLM: show full user message (#818)
  • remove filesystem info from rlm system prompts (#817)
  • RLM: add prompt verbosity parameters (#814)
  • re-raise auth errs + fix overlong prompt err for openrouter (#813)
  • add default sandbox_labels to rlm-secrets (#810)
  • Improve vf-eval display (#809)
  • Increase Sandbox Default Thread Worker Count (#807)
  • Tool content validation (#806)
  • env server (#799)
  • Add Browser Env Integration (#732)
  • adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
  • verifiers: fix tool call rendering from saved outputs (#802)
  • RLM: remove code jail -> simplify code (#800)
  • Add sync bulk sandbox teardown for RLM env (#798)
  • overhaul saving outputs (#774)
  • Propagate RLM stop errors from root and sub tools (#797)
  • clean up on task cancelation to avoid resource leakage (#795)
  • CliAgentEnv: teardown sandboxes via bulk delete (#796)
  • RLM: Fix trajectory collision (#786)
  • lazy import datasets (#794)
  • add rLLM integration to docs
  • cancel outstanding tasks if one task raises in generate (#793)
  • revert wiki-search
  • update environments/README.md (#790)
  • util for enforcing env vars are set (#789)
  • return last result if retries exhausted (#782)
  • warning log in RLTrainer (#783)
  • RLM: Simplify code (#781)
  • hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
  • fix retry for invalid model response errors (#778)
  • Make local RLM REPL concurrency configurable (#777)
  • RLM: re-enable Sandboxes for both Python and Bash (#776)
  • fix alphabet-sort
  • fix alphabet-sort
  • raise on empty response error (openrouter) to trigger retries (#772)
  • mirror cli in toml config (#773)
  • fix double save results in vf-eval (#771)
  • remove log file for cli agent env (#770)
  • eval --debug mode to skip Rich (#769)
  • tools eval example
  • Harbor examples (#766)
  • Feature: Add tools metadata for eval viewer (#767)
  • expose sandbox labels in SandboxEnv and CliAgentEnv (#768)
  • RLM env stop condition fix (#757)
  • lazy init locked chromadb instance in wiki-search (#765)
  • created the rlm_secrets environment (#763)
  • Move RLM system prompt into first user prompt (#764)
  • gepa dep
  • integrated gepa training, ui to track (#747)
  • prime tunnel in cliagentenv (#746)
  • RLM: make bash REPL default, keep Python REPL optional (#758)
  • Sebastian/rlm file system 2026 01 20 (#756)
  • eval tui (#735)
  • multi-env evals config (#734)
  • Add retry support for infrastructure errors in vf-eval (#750)
  • RLM: tools, sub_tools, root_tools (#749)
  • optional DatasetBuilder pattern (#739)
  • Add copy mode to vf-tui (#745)
  • RLMEnv: Make sub-LLM calls work for training (#738)

v0.1.10.dev1

04 Feb 09:44
9928ae2

Choose a tag to compare

Verifiers v0.1.10.dev1 Release Notes

Date: 02/04/2026

Full Changelog: v0.1.10.dev0...v0.1.10.dev1

Changes since v0.1.10.dev0

  • info oaitools fix (#821)
  • Capture stdout/stderr for live display (#819)
  • track token usage in eval (#816)
  • RLM: show full user message (#818)
  • remove filesystem info from rlm system prompts (#817)
  • RLM: add prompt verbosity parameters (#814)
  • re-raise auth errs + fix overlong prompt err for openrouter (#813)
  • add default sandbox_labels to rlm-secrets (#810)
  • Improve vf-eval display (#809)
  • Increase Sandbox Default Thread Worker Count (#807)
  • Tool content validation (#806)
  • env server (#799)
  • Add Browser Env Integration (#732)
  • adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
  • verifiers: fix tool call rendering from saved outputs (#802)
  • RLM: remove code jail -> simplify code (#800)
  • Add sync bulk sandbox teardown for RLM env (#798)
  • overhaul saving outputs (#774)
  • Propagate RLM stop errors from root and sub tools (#797)
  • clean up on task cancelation to avoid resource leakage (#795)
  • CliAgentEnv: teardown sandboxes via bulk delete (#796)
  • RLM: Fix trajectory collision (#786)
  • lazy import datasets (#794)
  • add rLLM integration to docs
  • cancel outstanding tasks if one task raises in generate (#793)
  • revert wiki-search
  • update environments/README.md (#790)
  • util for enforcing env vars are set (#789)
  • return last result if retries exhausted (#782)
  • warning log in RLTrainer (#783)
  • RLM: Simplify code (#781)
  • hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
  • fix retry for invalid model response errors (#778)
  • Make local RLM REPL concurrency configurable (#777)
  • RLM: re-enable Sandboxes for both Python and Bash (#776)
  • fix alphabet-sort
  • fix alphabet-sort
  • raise on empty response error (openrouter) to trigger retries (#772)
  • mirror cli in toml config (#773)
  • fix double save results in vf-eval (#771)
  • remove log file for cli agent env (#770)
  • eval --debug mode to skip Rich (#769)
  • tools eval example
  • Harbor examples (#766)
  • Feature: Add tools metadata for eval viewer (#767)
  • expose sandbox labels in SandboxEnv and CliAgentEnv (#768)
  • RLM env stop condition fix (#757)
  • lazy init locked chromadb instance in wiki-search (#765)
  • created the rlm_secrets environment (#763)
  • Move RLM system prompt into first user prompt (#764)
  • gepa dep
  • integrated gepa training, ui to track (#747)
  • prime tunnel in cliagentenv (#746)
  • RLM: make bash REPL default, keep Python REPL optional (#758)
  • Sebastian/rlm file system 2026 01 20 (#756)
  • eval tui (#735)
  • multi-env evals config (#734)
  • Add retry support for infrastructure errors in vf-eval (#750)
  • RLM: tools, sub_tools, root_tools (#749)
  • optional DatasetBuilder pattern (#739)
  • Add copy mode to vf-tui (#745)
  • RLMEnv: Make sub-LLM calls work for training (#738)

v0.1.10.dev0

17 Jan 20:11
68de752

Choose a tag to compare

Verifiers v0.1.10.dev0 Release Notes

Date: 01/17/2026

Full Changelog: v0.1.9.post3...v0.1.10.dev0

v0.1.9.post3

14 Jan 19:08
6e7e31c

Choose a tag to compare

Verifiers v0.1.9 Release Notes

Date: 01/08/2026

Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.

Post-release update:

  • Tweaks to setup script (post1).
  • Fix for exporting setup script (post0).
  • Fix for gitignore section in setup script (post2).
  • Tweaks to setup script and endpoint defaults (post3).

Highlights

  • RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via llm_batch() function intercepted through HTTP proxy. See RLM paper.

  • GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.

  • CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (CliAgentEnv) and loading Harbor-format tasks (HarborEnv).

  • MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.

  • Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:

    • MultiTurnEnv: num_turns
    • ToolEnv: tool_call_count
    • SandboxEnv: sandbox_call_count, sandbox_total_time_seconds, sandbox_mean_time_seconds
    • PythonEnv: repl_call_count, repl_total_time_seconds, repl_mean_time_seconds
    • RLMEnv: Sub-LLM metrics and more
  • Improved Error Handling: New error chain helpers, better error propagation through rollouts, and abort_on_code_timeout support for sandbox environments.

  • Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.

New Features

Environments

  • Add get_sandbox_request hook for per-rollout sandbox customization (#699)
  • Expose render_completion and add_trajectory_step methods with private/final guardrails (#679)
  • Add final_messages pattern for cleaner message handling (#677)
  • Support for token-in vLLM endpoint (#626)
  • Static make_dataset function for environments (#683)
  • Add alphabet-sort example environment (#695)
  • system_prompt is now prepended to existing prompts that don't already start with a system message

Evaluation & Training

  • Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
  • vf-tui improvements: regex search modal and run details panel (#705)
  • Log eventloop lag during vf-eval (#687)
  • Log timings in vf-eval (#686)
  • Show rolling average as tqdm postfix (#693)
  • Option to bypass scoring for faster iteration (#645)
  • Add trajectory_id to TrajectoryStep (#675)

Rubrics

  • Add RLM monitor rubric for sub-LLM metrics (#698)
  • Improvements to math rubric with better timeout handling (#657)
  • JudgeRubric now accepts optional state argument (#684)

Error Handling

  • Helpers for error chains (#649)
  • Better error handling with abort_on_code_timeout (#659)
  • Handle all truncation cases (#637)
  • Raise ModelError when response.choices is None (#640)
  • Apply stop_errors pattern to StatefulToolEnv for parse/call errors (#618)
  • Normalize messages from sub-LLM calls to prevent errors (#664)

Bug Fixes

  • Fix tool duplication when calling add_tool on ToolEnv with shared list reference
  • Fix args_to_skip validation failure for dict type parameters in StatefulToolEnv (#674)
  • Fix empty slice handling (#701)
  • Fix wiki-search environment (#697)
  • Fix tool test environment (#692)
  • Fix PythonEnv deadlock (#652)
  • Fix auto-format dataset for message_type=completions (#624)
  • Fix math verify timeout (#620)
  • Fix sub-LLM metrics and context warnings
  • pip_install_packages="" no longer breaks sandbox (#633)
  • Remove prompt logprobs to reduce memory usage (#666)
  • Warn when ignoring system prompt/few-shot with prompt present (#668)
  • Handle empty completions in parse_answer (#672)

Infrastructure & Documentation

  • Ensure integrations can be installed via full path (#704)
  • Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
  • Experimental folder structure for newer environments (#643)
  • Overhaul docs with example configs (#700)
  • Update docs for v0.1.8 API (#670)
  • Add automatic docs sync workflow (#628)
  • Redirect RTD to shared Mintlify docs (#654)
  • Dynamic logger names (#639)
  • Use threading for sandbox client (#638)
  • Bump prime-sandboxes>=2.7.0 (#660)

vf-setup Command

The vf-setup command bootstraps a verifiers training workspace:

Default behavior (no flags):

  • Creates configs/ and environments/ directories
  • Downloads AGENTS.md, CLAUDE.md, and environments/AGENTS.md for AI coding assistants
  • Downloads configs/endpoints.py (API endpoint configuration)
  • Downloads lab configs for quick experimentation (configs/lab/*.toml)

With --prime-rl:

  • Installs prime-rl and syncs dependencies
  • Installs all environments from environments/ into the prime-rl workspace
  • Downloads prime-rl-specific configs to configs/prime-rl/

With --vf-rl:

  • Downloads configs/zero3.yaml (DeepSpeed config)
  • Downloads vf-rl configs to configs/vf-rl/

With --skip-agents-md:

  • Skips downloading AGENTS.md, CLAUDE.md, and environments/AGENTS.md

Migration Notes

  • Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for num_turns, tool_call_count, etc., these are now provided automatically.
  • Third-party integrations (TextArena, ReasoningGym) have been moved to verifiers.envs.integrations.
  • Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in verifiers.envs.experimental and require explicit imports or verifiers[all] installation.

Full Changelog: v0.1.8.post2...v0.1.9

v0.1.9.post2

10 Jan 06:56

Choose a tag to compare

Verifiers v0.1.9 Release Notes

Date: 01/08/2026

Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.

Post-release update:

  • Tweaks to setup script (post1).
  • Fix for exporting setup script (post0).
  • Fix for gitignore section in setup script (post2).

Highlights

  • RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via llm_batch() function intercepted through HTTP proxy. See RLM paper.

  • GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.

  • CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (CliAgentEnv) and loading Harbor-format tasks (HarborEnv).

  • MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.

  • Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:

    • MultiTurnEnv: num_turns
    • ToolEnv: tool_call_count
    • SandboxEnv: sandbox_call_count, sandbox_total_time_seconds, sandbox_mean_time_seconds
    • PythonEnv: repl_call_count, repl_total_time_seconds, repl_mean_time_seconds
    • RLMEnv: Sub-LLM metrics and more
  • Improved Error Handling: New error chain helpers, better error propagation through rollouts, and abort_on_code_timeout support for sandbox environments.

  • Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.

New Features

Environments

  • Add get_sandbox_request hook for per-rollout sandbox customization (#699)
  • Expose render_completion and add_trajectory_step methods with private/final guardrails (#679)
  • Add final_messages pattern for cleaner message handling (#677)
  • Support for token-in vLLM endpoint (#626)
  • Static make_dataset function for environments (#683)
  • Add alphabet-sort example environment (#695)
  • system_prompt is now prepended to existing prompts that don't already start with a system message

Evaluation & Training

  • Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
  • vf-tui improvements: regex search modal and run details panel (#705)
  • Log eventloop lag during vf-eval (#687)
  • Log timings in vf-eval (#686)
  • Show rolling average as tqdm postfix (#693)
  • Option to bypass scoring for faster iteration (#645)
  • Add trajectory_id to TrajectoryStep (#675)

Rubrics

  • Add RLM monitor rubric for sub-LLM metrics (#698)
  • Improvements to math rubric with better timeout handling (#657)
  • JudgeRubric now accepts optional state argument (#684)

Error Handling

  • Helpers for error chains (#649)
  • Better error handling with abort_on_code_timeout (#659)
  • Handle all truncation cases (#637)
  • Raise ModelError when response.choices is None (#640)
  • Apply stop_errors pattern to StatefulToolEnv for parse/call errors (#618)
  • Normalize messages from sub-LLM calls to prevent errors (#664)

Bug Fixes

  • Fix tool duplication when calling add_tool on ToolEnv with shared list reference
  • Fix args_to_skip validation failure for dict type parameters in StatefulToolEnv (#674)
  • Fix empty slice handling (#701)
  • Fix wiki-search environment (#697)
  • Fix tool test environment (#692)
  • Fix PythonEnv deadlock (#652)
  • Fix auto-format dataset for message_type=completions (#624)
  • Fix math verify timeout (#620)
  • Fix sub-LLM metrics and context warnings
  • pip_install_packages="" no longer breaks sandbox (#633)
  • Remove prompt logprobs to reduce memory usage (#666)
  • Warn when ignoring system prompt/few-shot with prompt present (#668)
  • Handle empty completions in parse_answer (#672)

Infrastructure & Documentation

  • Ensure integrations can be installed via full path (#704)
  • Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
  • Experimental folder structure for newer environments (#643)
  • Overhaul docs with example configs (#700)
  • Update docs for v0.1.8 API (#670)
  • Add automatic docs sync workflow (#628)
  • Redirect RTD to shared Mintlify docs (#654)
  • Dynamic logger names (#639)
  • Use threading for sandbox client (#638)
  • Bump prime-sandboxes>=2.7.0 (#660)

vf-setup Command

The vf-setup command bootstraps a verifiers training workspace:

Default behavior (no flags):

  • Creates configs/ and environments/ directories
  • Downloads AGENTS.md, CLAUDE.md, and environments/AGENTS.md for AI coding assistants
  • Downloads configs/endpoints.py (API endpoint configuration)
  • Downloads lab configs for quick experimentation (configs/lab/*.toml)

With --prime-rl:

  • Installs prime-rl and syncs dependencies
  • Installs all environments from environments/ into the prime-rl workspace
  • Downloads prime-rl-specific configs to configs/prime-rl/

With --vf-rl:

  • Downloads configs/zero3.yaml (DeepSpeed config)
  • Downloads vf-rl configs to configs/vf-rl/

With --skip-agents-md:

  • Skips downloading AGENTS.md, CLAUDE.md, and environments/AGENTS.md

Migration Notes

  • Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for num_turns, tool_call_count, etc., these are now provided automatically.
  • Third-party integrations (TextArena, ReasoningGym) have been moved to verifiers.envs.integrations.
  • Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in verifiers.envs.experimental and require explicit imports or verifiers[all] installation.

Full Changelog: v0.1.8.post2...v0.1.9