14 Feb 11:34

github-actions

0f1e334

v0.1.11.dev0 Latest

Latest

Verifiers v0.1.11.dev0 Release Notes

Date: 02/14/2026

Full Changelog: v0.1.10...v0.1.11.dev0

Assets 5

11 Feb 00:33

github-actions

v0.1.10

ccef044

v0.1.10

Verifiers v0.1.10 Release Notes

Date: 02/10/2026

Full Changelog: v0.1.9...v0.1.10

Highlights since v0.1.9

Expanded environment support with OpenEnv and BrowserEnv integrations, env worker plumbing, and continued improvements to CliAgentEnv/RLMEnv sandbox reliability and customization hooks.
Upgraded evaluation ergonomics with resumed evals, improved TUI info/log presentation, better rollout/token tracking, and non-TUI overflow rendering fixes.
Improved reliability across model/runtime boundaries with timeout hardening, safer sandbox lifecycle behavior, and richer error/metadata handling.
Modernized workspace setup and contributor workflows (vf-setup endpoint/config updates, GEPA config support, Prime CLI refactor, skills scaffolding, and AGENTS guidance updates).
Added opencode harbor enhancements, including TITO support, tunnel sync stop behavior, and terminal-bench task coverage.

Changes included in v0.1.10 (since v0.1.9)

Environment, rollout, and runtime improvements

openenv integration (#829)
Add Browser Env Integration (#732)
resume evals (#803)
add Client Pool (#815)
RLM: Eager sandbox creation, conditional pip install (#834)
RLM: Add RLMEnv sandbox hooks for safer customization (#849)
RLM: Make FIFO IO non-blocking (#850)
CliAgentEnv: add SandboxMixin, refactor InterceptionServer (#847)
rlm: migrate sandbox executor to SandboxMixin (#875)
env worker integration (#832)
track vf + env version in metadata (#881)
handle empty metrics (#855)
move sanitize_metadata out of save_metadata (#852)
improve env client timeouts (#872)
Fix ty logger protocol typing in sandbox retry setup (#835)
Fix vf-eval concurrent rollout label to use effective cap (#836)

Evaluation UX, logging, and metrics

Add robust token usage tracking (#858)
Tighten vf-tui info preview formatting and typing checks (#830)
Add subtle --debug hint beneath Logs panel (#824)
Fix vf-eval non-TUI live overflow rendering (#883)
misc logging improvs (#882)

Setup, CLI, and configuration

vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
Support long endpoint field names in TOML registries (#861)
prime CLI refactor (vf) (#870)
refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
move rlm secrets out of vf and into research-environments (#856)

Documentation, workflows, and skills

Compile AGENTS docs from modular assets and make guidance concrete (#857)
skills setup (#873)
Strengthen lab AGENTS env-development guardrails (#876)
Clarify MCPEnv is for global read-only MCP servers (#838)
docs: remove parser-centric guidance from environment READMEs (#839)
docs: remove parser field from env init README template (#840)
chore: enforce ruff formatting and improve dev tooling docs (#845)

Integrations and environment packages

openenv: default template proj/ path and simplify prompt renderer signatures (#853)
remove vf pin in opencode_harbor (#844)
opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
ci: skip terminus_harbor in test-envs (#846)
fix math rubric timeouts (#831)
Fix for dir resolutions (#879)
Update browse-environments freshness and quality priorities (#884)
Clarify agent skill handling (#886)

Assets 5

10 Feb 07:46

github-actions

v0.1.10.dev5

0df300b

v0.1.10.dev5

Verifiers v0.1.10.dev5 Release Notes

Date: 02/10/2026

Full Changelog: v0.1.10.dev4...v0.1.10.dev5

Highlights since v0.1.10.dev4

Improved environment worker reliability and metadata tracking by migrating sandbox execution internals and recording verifiers/environment versions in rollout metadata.
Polished evaluation UX with a fix for non-TUI live overflow rendering and broader logging improvements.
Added and refined developer-facing guidance for skills and agent workflows, including clearer skill handling and browse-environment quality priorities.
Extended opencode harbor support with TITO support, tunnel sync stop behavior, and a terminal-bench task addition.

Incremental changes since v0.1.10.dev4

env worker integration (#832)
track vf + env version in metadata (#881)
misc logging improvs (#882)
rlm: migrate sandbox executor to SandboxMixin (#875)
opencode env: TITO support, tunnel sync stop, add terminal-bench (#874)
Fix vf-eval non-TUI live overflow rendering (#883)
Fix for dir resolutions (#879)
Update browse-environments freshness and quality priorities (#884)
Clarify agent skill handling (#886)

Assets 5

09 Feb 02:53

github-actions

v0.1.10.dev4

56676d6

v0.1.10.dev4

Verifiers v0.1.10.dev4 Release Notes

Date: 02/09/2026

Full Changelog: v0.1.10.dev3...v0.1.10.dev4

Highlights since v0.1.10.dev3

Refactored the vf CLI command surface around Prime CLI integration and plugin wiring.
Improved environment client timeout handling across worker and ZMQ clients, with regression coverage in timeout tests.
Added skills setup scaffolding and related docs updates for Prime Lab setup and workflow guidance.
Tightened AGENTS guidance for environment development guardrails in Lab-facing docs.

Incremental changes since v0.1.10.dev3

improve env client timeouts (#872)
skills setup (#873)
Strengthen lab AGENTS env-development guardrails (#876)
prime CLI refactor (vf) (#870)

Assets 5

08 Feb 11:50

github-actions

v0.1.10.dev3

b8b502c

v0.1.10.dev3

Verifiers v0.1.10.dev3 Release Notes

Date: 02/08/2026

Full Changelog: v0.1.9...v0.1.10.dev3

Highlights since v0.1.9

Added new environment capabilities, including OpenEnv integration, BrowserEnv integration, and env server support for more flexible tool and environment workflows.
Expanded evaluation UX with eval TUI, copy mode, improved logs/debug display, rollout token usage tracking, and richer saved-output rendering for tool calls.
Introduced and iterated on RLMEnv improvements: tool partitioning (tools, root_tools, sub_tools), better stop/error propagation, prompt/verbosity controls, safer sandbox lifecycle handling, and new sandbox hooks for customization.
Improved reliability across execution and infrastructure paths via retries for infrastructure and model-response errors, better auth/overlong prompt handling for OpenRouter, and cleanup fixes to avoid task/sandbox leakage.
Modernized setup and training ergonomics with vf-setup config changes (endpoints.toml, configs/rl, GEPA configs), support for long TOML endpoint field names, and an optional in-repo verifiers-rl package split.
Hardened runtime internals with CliAgentEnv sandbox/interception refactors, client pooling, non-blocking FIFO IO for RLM, and metadata/metrics handling fixes.
Added broader OpenEnv ecosystem support and examples (e.g., openenv_echo, openenv_textarena, opencode_harbor) with updated version requirements.

Incremental changes since v0.1.10.dev2

Compile AGENTS docs from modular assets and make guidance concrete (#857)
vf-setup: prefer endpoints.toml, rename configs/lab->configs/rl, add GEPA configs, deprecate --vf-rl (#859)
Support long endpoint field names in TOML registries (#861)
Add robust token usage tracking (#858)
move rlm secrets out of vf and into research-environments (#856)
CliAgentEnv: add SandboxMixin, refactor InterceptionServer (#847)
handle empty metrics (#855)
move sanitize_metadata out of save_metadata (#852)
openenv: default template proj/ path and simplify prompt renderer signatures (#853)
refactor: split RL trainer into optional in-repo verifiers-rl package (#843)
add Client Pool (#815)
chore: enforce ruff formatting and improve dev tooling docs (#845)
RLM: Make FIFO IO non-blocking (#850)
RLM: Add RLMEnv sandbox hooks for safer customization (#849)
RLM: Eager sandbox creation, conditional pip install (#834)
ci: skip terminus_harbor in test-envs (#846)
resume evals (#803)
remove vf pin in opencode_harbor (#844)
fix math rubric timeouts (#831)
docs: remove parser-centric guidance from environment READMEs (#839)
openenv integration (#829)
Fix ty logger protocol typing in sandbox retry setup (#835)
docs: remove parser field from env init README template (#840)
Clarify MCPEnv is for global read-only MCP servers (#838)
Fix vf-eval concurrent rollout label to use effective cap (#836)
Tighten vf-tui info preview formatting and typing checks (#830)
Add subtle --debug hint beneath Logs panel (#824)

Assets 5

04 Feb 10:01

github-actions

v0.1.10.dev2

4341498

v0.1.10.dev2

Verifiers v0.1.10.dev1 Release Notes

Date: 02/04/2026

Full Changelog: v0.1.10.dev0...v0.1.10.dev1

Changes since v0.1.10.dev0

info oaitools fix (#821)
Capture stdout/stderr for live display (#819)
track token usage in eval (#816)
RLM: show full user message (#818)
remove filesystem info from rlm system prompts (#817)
RLM: add prompt verbosity parameters (#814)
re-raise auth errs + fix overlong prompt err for openrouter (#813)
add default sandbox_labels to rlm-secrets (#810)
Improve vf-eval display (#809)
Increase Sandbox Default Thread Worker Count (#807)
Tool content validation (#806)
env server (#799)
Add Browser Env Integration (#732)
adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
verifiers: fix tool call rendering from saved outputs (#802)
RLM: remove code jail -> simplify code (#800)
Add sync bulk sandbox teardown for RLM env (#798)
overhaul saving outputs (#774)
Propagate RLM stop errors from root and sub tools (#797)
clean up on task cancelation to avoid resource leakage (#795)
CliAgentEnv: teardown sandboxes via bulk delete (#796)
RLM: Fix trajectory collision (#786)
lazy import datasets (#794)
add rLLM integration to docs
cancel outstanding tasks if one task raises in generate (#793)
revert wiki-search
update environments/README.md (#790)
util for enforcing env vars are set (#789)
return last result if retries exhausted (#782)
warning log in RLTrainer (#783)
RLM: Simplify code (#781)
hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
fix retry for invalid model response errors (#778)
Make local RLM REPL concurrency configurable (#777)
RLM: re-enable Sandboxes for both Python and Bash (#776)
fix alphabet-sort
fix alphabet-sort
raise on empty response error (openrouter) to trigger retries (#772)
mirror cli in toml config (#773)
fix double save results in vf-eval (#771)
remove log file for cli agent env (#770)
eval --debug mode to skip Rich (#769)
tools eval example
Harbor examples (#766)
Feature: Add tools metadata for eval viewer (#767)
expose sandbox labels in SandboxEnv and CliAgentEnv (#768)
RLM env stop condition fix (#757)
lazy init locked chromadb instance in wiki-search (#765)
created the rlm_secrets environment (#763)
Move RLM system prompt into first user prompt (#764)
gepa dep
integrated gepa training, ui to track (#747)
prime tunnel in cliagentenv (#746)
RLM: make bash REPL default, keep Python REPL optional (#758)
Sebastian/rlm file system 2026 01 20 (#756)
eval tui (#735)
multi-env evals config (#734)
Add retry support for infrastructure errors in vf-eval (#750)
RLM: tools, sub_tools, root_tools (#749)
optional DatasetBuilder pattern (#739)
Add copy mode to vf-tui (#745)
RLMEnv: Make sub-LLM calls work for training (#738)

Assets 5

04 Feb 09:44

github-actions

v0.1.10.dev1

9928ae2

v0.1.10.dev1

Verifiers v0.1.10.dev1 Release Notes

Date: 02/04/2026

Full Changelog: v0.1.10.dev0...v0.1.10.dev1

Changes since v0.1.10.dev0

info oaitools fix (#821)
Capture stdout/stderr for live display (#819)
track token usage in eval (#816)
RLM: show full user message (#818)
remove filesystem info from rlm system prompts (#817)
RLM: add prompt verbosity parameters (#814)
re-raise auth errs + fix overlong prompt err for openrouter (#813)
add default sandbox_labels to rlm-secrets (#810)
Improve vf-eval display (#809)
Increase Sandbox Default Thread Worker Count (#807)
Tool content validation (#806)
env server (#799)
Add Browser Env Integration (#732)
adjust CliAgentEnv sandbox creation timeout + remove DummyHarborEnv (#804)
verifiers: fix tool call rendering from saved outputs (#802)
RLM: remove code jail -> simplify code (#800)
Add sync bulk sandbox teardown for RLM env (#798)
overhaul saving outputs (#774)
Propagate RLM stop errors from root and sub tools (#797)
clean up on task cancelation to avoid resource leakage (#795)
CliAgentEnv: teardown sandboxes via bulk delete (#796)
RLM: Fix trajectory collision (#786)
lazy import datasets (#794)
add rLLM integration to docs
cancel outstanding tasks if one task raises in generate (#793)
revert wiki-search
update environments/README.md (#790)
util for enforcing env vars are set (#789)
return last result if retries exhausted (#782)
warning log in RLTrainer (#783)
RLM: Simplify code (#781)
hello world tasks for TerminusHarborEnv and OpenCodeHarborEnv (#775)
fix retry for invalid model response errors (#778)
Make local RLM REPL concurrency configurable (#777)
RLM: re-enable Sandboxes for both Python and Bash (#776)
fix alphabet-sort
fix alphabet-sort
raise on empty response error (openrouter) to trigger retries (#772)
mirror cli in toml config (#773)
fix double save results in vf-eval (#771)
remove log file for cli agent env (#770)
eval --debug mode to skip Rich (#769)
tools eval example
Harbor examples (#766)
Feature: Add tools metadata for eval viewer (#767)
expose sandbox labels in SandboxEnv and CliAgentEnv (#768)
RLM env stop condition fix (#757)
lazy init locked chromadb instance in wiki-search (#765)
created the rlm_secrets environment (#763)
Move RLM system prompt into first user prompt (#764)
gepa dep
integrated gepa training, ui to track (#747)
prime tunnel in cliagentenv (#746)
RLM: make bash REPL default, keep Python REPL optional (#758)
Sebastian/rlm file system 2026 01 20 (#756)
eval tui (#735)
multi-env evals config (#734)
Add retry support for infrastructure errors in vf-eval (#750)
RLM: tools, sub_tools, root_tools (#749)
optional DatasetBuilder pattern (#739)
Add copy mode to vf-tui (#745)
RLMEnv: Make sub-LLM calls work for training (#738)

Assets 5

17 Jan 20:11

github-actions

v0.1.10.dev0

68de752

v0.1.10.dev0

Verifiers v0.1.10.dev0 Release Notes

Date: 01/17/2026

Full Changelog: v0.1.9.post3...v0.1.10.dev0

Assets 5

14 Jan 19:08

github-actions

v0.1.9.post3

6e7e31c

v0.1.9.post3

Verifiers v0.1.9 Release Notes

Date: 01/08/2026

Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.

Post-release update:

Tweaks to setup script (post1).
Fix for exporting setup script (post0).
Fix for gitignore section in setup script (post2).
Tweaks to setup script and endpoint defaults (post3).

Highlights

RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via llm_batch() function intercepted through HTTP proxy. See RLM paper.
GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.
CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (CliAgentEnv) and loading Harbor-format tasks (HarborEnv).
MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.
Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:
- MultiTurnEnv: num_turns
- ToolEnv: tool_call_count
- SandboxEnv: sandbox_call_count, sandbox_total_time_seconds, sandbox_mean_time_seconds
- PythonEnv: repl_call_count, repl_total_time_seconds, repl_mean_time_seconds
- RLMEnv: Sub-LLM metrics and more
Improved Error Handling: New error chain helpers, better error propagation through rollouts, and abort_on_code_timeout support for sandbox environments.
Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.

New Features

Environments

Add get_sandbox_request hook for per-rollout sandbox customization (#699)
Expose render_completion and add_trajectory_step methods with private/final guardrails (#679)
Add final_messages pattern for cleaner message handling (#677)
Support for token-in vLLM endpoint (#626)
Static make_dataset function for environments (#683)
Add alphabet-sort example environment (#695)
system_prompt is now prepended to existing prompts that don't already start with a system message

Evaluation & Training

Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
vf-tui improvements: regex search modal and run details panel (#705)
Log eventloop lag during vf-eval (#687)
Log timings in vf-eval (#686)
Show rolling average as tqdm postfix (#693)
Option to bypass scoring for faster iteration (#645)
Add trajectory_id to TrajectoryStep (#675)

Rubrics

Add RLM monitor rubric for sub-LLM metrics (#698)
Improvements to math rubric with better timeout handling (#657)
JudgeRubric now accepts optional state argument (#684)

Error Handling

Helpers for error chains (#649)
Better error handling with abort_on_code_timeout (#659)
Handle all truncation cases (#637)
Raise ModelError when response.choices is None (#640)
Apply stop_errors pattern to StatefulToolEnv for parse/call errors (#618)
Normalize messages from sub-LLM calls to prevent errors (#664)

Bug Fixes

Fix tool duplication when calling add_tool on ToolEnv with shared list reference
Fix args_to_skip validation failure for dict type parameters in StatefulToolEnv (#674)
Fix empty slice handling (#701)
Fix wiki-search environment (#697)
Fix tool test environment (#692)
Fix PythonEnv deadlock (#652)
Fix auto-format dataset for message_type=completions (#624)
Fix math verify timeout (#620)
Fix sub-LLM metrics and context warnings
pip_install_packages="" no longer breaks sandbox (#633)
Remove prompt logprobs to reduce memory usage (#666)
Warn when ignoring system prompt/few-shot with prompt present (#668)
Handle empty completions in parse_answer (#672)

Infrastructure & Documentation

Ensure integrations can be installed via full path (#704)
Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
Experimental folder structure for newer environments (#643)
Overhaul docs with example configs (#700)
Update docs for v0.1.8 API (#670)
Add automatic docs sync workflow (#628)
Redirect RTD to shared Mintlify docs (#654)
Dynamic logger names (#639)
Use threading for sandbox client (#638)
Bump prime-sandboxes>=2.7.0 (#660)

`vf-setup` Command

The vf-setup command bootstraps a verifiers training workspace:

Default behavior (no flags):

Creates configs/ and environments/ directories
Downloads AGENTS.md, CLAUDE.md, and environments/AGENTS.md for AI coding assistants
Downloads configs/endpoints.py (API endpoint configuration)
Downloads lab configs for quick experimentation (configs/lab/*.toml)

With --prime-rl:

Installs prime-rl and syncs dependencies
Installs all environments from environments/ into the prime-rl workspace
Downloads prime-rl-specific configs to configs/prime-rl/

With --vf-rl:

Downloads configs/zero3.yaml (DeepSpeed config)
Downloads vf-rl configs to configs/vf-rl/

With --skip-agents-md:

Skips downloading AGENTS.md, CLAUDE.md, and environments/AGENTS.md

Migration Notes

Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for num_turns, tool_call_count, etc., these are now provided automatically.
Third-party integrations (TextArena, ReasoningGym) have been moved to verifiers.envs.integrations.
Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in verifiers.envs.experimental and require explicit imports or verifiers[all] installation.

Full Changelog: v0.1.8.post2...v0.1.9

Assets 5

10 Jan 06:56

github-actions

v0.1.9.post2

f095306

v0.1.9.post2

Verifiers v0.1.9 Release Notes

Date: 01/08/2026

Verifiers v0.1.9 introduces several new experimental environments, monitor rubrics for automatic metrics collection, improved error handling, and documentation overhaul.

Post-release update:

Tweaks to setup script (post1).
Fix for exporting setup script (post0).
Fix for gitignore section in setup script (post2).

Highlights

RLMEnv (Experimental): New environment implementing the Recursive Language Model (RLM) inference strategy, where language models decompose and recursively interact with input data through sandboxed REPL environments. Supports sub-LLM calls via llm_batch() function intercepted through HTTP proxy. See RLM paper.
GymEnv (Experimental): Universal Gym-compatible environment runner for standard RL gymnasium environments. Enables training on classic control tasks and custom Gym environments.
CliAgentEnv & HarborEnv (Experimental): New environments for running custom agent code in sandboxes (CliAgentEnv) and loading Harbor-format tasks (HarborEnv).
MCPEnv (Experimental): Environment for Model Context Protocol (MCP) server integration.
Monitor Rubrics: Each environment now automatically includes a monitor rubric that tracks environment-specific metrics without affecting rewards (weight=0). For example:
- MultiTurnEnv: num_turns
- ToolEnv: tool_call_count
- SandboxEnv: sandbox_call_count, sandbox_total_time_seconds, sandbox_mean_time_seconds
- PythonEnv: repl_call_count, repl_total_time_seconds, repl_mean_time_seconds
- RLMEnv: Sub-LLM metrics and more
Improved Error Handling: New error chain helpers, better error propagation through rollouts, and abort_on_code_timeout support for sandbox environments.
Documentation Overhaul: Complete reorganization of documentation with new Mintlify-based docs, improved examples, and automatic docs sync workflow.

New Features

Environments

Add get_sandbox_request hook for per-rollout sandbox customization (#699)
Expose render_completion and add_trajectory_step methods with private/final guardrails (#679)
Add final_messages pattern for cleaner message handling (#677)
Support for token-in vLLM endpoint (#626)
Static make_dataset function for environments (#683)
Add alphabet-sort example environment (#695)
system_prompt is now prepended to existing prompts that don't already start with a system message

Evaluation & Training

Optionally enable independent per-rollout scoring: run and score rollouts independently rather than only in groups (#694)
vf-tui improvements: regex search modal and run details panel (#705)
Log eventloop lag during vf-eval (#687)
Log timings in vf-eval (#686)
Show rolling average as tqdm postfix (#693)
Option to bypass scoring for faster iteration (#645)
Add trajectory_id to TrajectoryStep (#675)

Rubrics

Add RLM monitor rubric for sub-LLM metrics (#698)
Improvements to math rubric with better timeout handling (#657)
JudgeRubric now accepts optional state argument (#684)

Error Handling

Helpers for error chains (#649)
Better error handling with abort_on_code_timeout (#659)
Handle all truncation cases (#637)
Raise ModelError when response.choices is None (#640)
Apply stop_errors pattern to StatefulToolEnv for parse/call errors (#618)
Normalize messages from sub-LLM calls to prevent errors (#664)

Bug Fixes

Fix tool duplication when calling add_tool on ToolEnv with shared list reference
Fix args_to_skip validation failure for dict type parameters in StatefulToolEnv (#674)
Fix empty slice handling (#701)
Fix wiki-search environment (#697)
Fix tool test environment (#692)
Fix PythonEnv deadlock (#652)
Fix auto-format dataset for message_type=completions (#624)
Fix math verify timeout (#620)
Fix sub-LLM metrics and context warnings
pip_install_packages="" no longer breaks sandbox (#633)
Remove prompt logprobs to reduce memory usage (#666)
Warn when ignoring system prompt/few-shot with prompt present (#668)
Handle empty completions in parse_answer (#672)

Infrastructure & Documentation

Ensure integrations can be installed via full path (#704)
Reorganize third-party env integrations (TextArena, ReasoningGym, etc.) (#682)
Experimental folder structure for newer environments (#643)
Overhaul docs with example configs (#700)
Update docs for v0.1.8 API (#670)
Add automatic docs sync workflow (#628)
Redirect RTD to shared Mintlify docs (#654)
Dynamic logger names (#639)
Use threading for sandbox client (#638)
Bump prime-sandboxes>=2.7.0 (#660)

`vf-setup` Command

The vf-setup command bootstraps a verifiers training workspace:

Default behavior (no flags):

Creates configs/ and environments/ directories
Downloads AGENTS.md, CLAUDE.md, and environments/AGENTS.md for AI coding assistants
Downloads configs/endpoints.py (API endpoint configuration)
Downloads lab configs for quick experimentation (configs/lab/*.toml)

With --prime-rl:

Installs prime-rl and syncs dependencies
Installs all environments from environments/ into the prime-rl workspace
Downloads prime-rl-specific configs to configs/prime-rl/

With --vf-rl:

Downloads configs/zero3.yaml (DeepSpeed config)
Downloads vf-rl configs to configs/vf-rl/

With --skip-agents-md:

Skips downloading AGENTS.md, CLAUDE.md, and environments/AGENTS.md

Migration Notes

Environments now automatically include monitor rubrics which track default class-specific metrics. If you were manually adding metrics for num_turns, tool_call_count, etc., these are now provided automatically.
Third-party integrations (TextArena, ReasoningGym) have been moved to verifiers.envs.integrations.
Experimental environments (GymEnv, MCPEnv, CliAgentEnv, HarborEnv, RLMEnv) are now in verifiers.envs.experimental and require explicit imports or verifiers[all] installation.

Full Changelog: v0.1.8.post2...v0.1.9

Assets 5

Releases: PrimeIntellect-ai/verifiers

v0.1.11.dev0

Verifiers v0.1.11.dev0 Release Notes

Uh oh!

v0.1.10

Verifiers v0.1.10 Release Notes

Highlights since v0.1.9

Changes included in v0.1.10 (since v0.1.9)

Environment, rollout, and runtime improvements

Evaluation UX, logging, and metrics

Setup, CLI, and configuration

Documentation, workflows, and skills

Integrations and environment packages

Uh oh!

v0.1.10.dev5

Verifiers v0.1.10.dev5 Release Notes

Highlights since v0.1.10.dev4

Incremental changes since v0.1.10.dev4

Uh oh!

v0.1.10.dev4

Verifiers v0.1.10.dev4 Release Notes

Highlights since v0.1.10.dev3

Incremental changes since v0.1.10.dev3

Uh oh!

v0.1.10.dev3

Verifiers v0.1.10.dev3 Release Notes

Highlights since v0.1.9

Incremental changes since v0.1.10.dev2

Uh oh!

v0.1.10.dev2

Verifiers v0.1.10.dev1 Release Notes

Changes since v0.1.10.dev0

Uh oh!

v0.1.10.dev1

Verifiers v0.1.10.dev1 Release Notes

Changes since v0.1.10.dev0

Uh oh!

v0.1.10.dev0

Verifiers v0.1.10.dev0 Release Notes

Uh oh!

v0.1.9.post3

Verifiers v0.1.9 Release Notes

Highlights

New Features

Environments

Evaluation & Training

Rubrics

Error Handling

Bug Fixes

Infrastructure & Documentation

vf-setup Command

Migration Notes

Uh oh!

v0.1.9.post2

Verifiers v0.1.9 Release Notes

Highlights

New Features

Environments

Evaluation & Training

Rubrics

Error Handling

Bug Fixes

Infrastructure & Documentation

vf-setup Command

Migration Notes

Uh oh!

`vf-setup` Command

`vf-setup` Command