Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ repos:
types_or: [python, pyi]
- id: sync-agents-md
name: Sync AGENTS.md from docs
entry: uv run python scripts/sync_agents_md.py
entry: uv run python scripts/sync.py
language: system
files: ^docs/(overview|environments)\.md$
files: ^(docs/environments\.md|assets/agents/.*\.md|scripts/sync\.py)$
pass_filenames: false
- id: ty
name: ty
Expand Down
130 changes: 13 additions & 117 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,125 +1,21 @@
# AGENTS.md

This guide covers best practices for building environments with `verifiers` and using them to train and evaluate LLMs. It is downloaded automatically using the setup script below (which has likely already been run if you're reading this). See `environments/AGENTS.md` for more details.
<!-- Generated for repository development workflows. Do not edit directly. -->

---
## Shared Best Practices (All Contexts)

Verifiers is our library for creating environments to train and evaluate LLMs.
These points are direct restatements of Verifiers docs so agents can follow the same golden-path workflows.

Environments contain everything required to run and evaluate a model on a particular task:
- A *dataset* of task inputs
- A *harness* for the model (tools, sandboxes, context management, etc.)
- A reward function or *rubric* to score the model's performance
- Environments are expected to expose `load_environment(...) -> vf.Environment` and be installable with `prime env install <env-name>`. (See `docs/overview.md` and `docs/environments.md`.)
- Validate environment behavior with `prime eval run <env-name> ...` before sharing/publishing changes. (See `docs/overview.md` and `docs/development.md`.)
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)

Environments can be used for training models with reinforcement learning (RL), evaluating capabilities, generating synthetic data, experimenting with agent harnesses, and more.
## Repository Development Notes

Verifiers is tightly integrated with the [Environments Hub](https://app.primeintellect.ai/dashboard/environments?ex_sort=most_stars), as well as our training framework [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) and our [Hosted Training](https://app.primeintellect.ai/dashboard/training) platform.
Use this guidance when contributing to the `verifiers` repository itself.

## Getting Started

Ensure you have `uv` installed, as well as the `prime` [CLI](https://docs.primeintellect.ai/cli-reference/introduction) tool:
```bash
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login
```
To set up a new workspace for developing environments, do:
```bash
# ~/dev/my-lab
prime lab setup
```

This sets up a Python project if needed (with `uv init`), installs `verifiers` (with `uv add verifiers`), creates the recommended workspace structure, and downloads useful starter files:
```
configs/
├── endpoints.toml # OpenAI-compatible API endpoint configuration
├── rl/ # Example configs for Hosted Training
├── eval/ # Example multi-environment eval configs
└── gepa/ # Example configs for prompt optimization
environments/
└── AGENTS.md # Documentation for AI coding agents
AGENTS.md # Top-level documentation for AI coding agents
CLAUDE.md # Claude-specific pointer to AGENTS.md
```

Alternatively, add `verifiers` to an existing project:
```bash
uv add verifiers && prime lab setup --skip-install
```

Environments built with Verifiers are self-contained Python modules. To initialize a fresh environment template, do:
```bash
prime env init my-env # creates a new template in ./environments/my_env
```

This will create a new module called `my_env` with a basic environment template.
```
environments/my_env/
├── my_env.py # Main implementation
├── pyproject.toml # Dependencies and metadata
└── README.md # Documentation
```

Environment modules should expose a `load_environment` function which returns an instance of the Environment object, and which can accept custom arguments. For example:
```python
# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
dataset = vf.load_example_dataset(dataset_name) # 'question'
async def correct_answer(completion, answer) -> float:
completion_ans = completion[-1]['content']
return 1.0 if completion_ans == answer else 0.0
rubric = Rubric(funcs=[correct_answer])
env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
return env
```

To install the environment module into your project, do:
```bash
prime env install my-env # installs from ./environments/my_env
```

To install an environment from the Environments Hub into your project, do:
```bash
prime env install primeintellect/math-python
```

To run a local evaluation with any OpenAI-compatible model, do:
```bash
prime eval run my-env -m gpt-5-nano # run and save eval results locally
```
Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`.

View local evaluation results in the terminal UI:
```bash
prime eval tui
```
In the TUI, press `c` to open Copy Mode for prompt/completion text; highlight and press `c` again to copy.

To publish the environment to the [Environments Hub](https://app.primeintellect.ai/dashboard/environments?ex_sort=most_stars), do:
```bash
prime env push --path ./environments/my_env
```

To run an evaluation directly from the Environments Hub, do:
```bash
prime eval run primeintellect/math-python
```

## Documentation

**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.

**[Evaluation](evaluation.md)** - Evaluate models using your environments.

**[Training](training.md)** — Train models in your environments with reinforcement learning.

**[Development](development.md)** — Contributing to verifiers

**[API Reference](reference.md)** — Understanding the API and data structures

**[FAQs](faqs.md)** - Other frequently asked questions.
- Run the documented contributor checks for touched areas: `uv run ruff check --fix .`, `uv run pytest tests/`, and `uv run pre-commit run --all-files` as needed. (See `docs/development.md`.)
- Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
- Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
- Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# CLAUDE.md

Before beginning any user tasks, please refer to `AGENTS.md` and follow all relevant guidelines, and view `environments/AGENTS.md` for information on building environments with `verifiers`. Treat all `AGENTS.md` files as equivalent to `CLAUDE.md` files.
<!-- Generated for repository development workflows. Do not edit directly. -->

Before beginning work in this repository, read `AGENTS.md` and follow all scoped AGENTS guidance.
8 changes: 8 additions & 0 deletions assets/agents/common_best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Shared Best Practices (All Contexts)

These points are direct restatements of Verifiers docs so agents can follow the same golden-path workflows.

- Environments are expected to expose `load_environment(...) -> vf.Environment` and be installable with `prime env install <env-name>`. (See `docs/overview.md` and `docs/environments.md`.)
- Validate environment behavior with `prime eval run <env-name> ...` before sharing/publishing changes. (See `docs/overview.md` and `docs/development.md`.)
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)
62 changes: 62 additions & 0 deletions assets/agents/compilation_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# AGENTS / CLAUDE Compilation Design

## Problem

The repository currently uses a single root `AGENTS.md` as both:

1. Contributor guidance for this repo.
2. Downloaded guidance for end users (`prime lab setup` / `vf-setup`).

That coupling makes edits noisy and mixes audiences.

## Goals

- Separate **source-of-truth content** for:
- shared best practices,
- repo-only guidance,
- end-user lab guidance.
- Compile outputs for both audiences without duplicating text.
- Move setup-downloaded AGENTS/CLAUDE sources under `assets/lab/`.
- Keep environment guide generation (`environments/AGENTS.md`) in the same compile flow.

## Proposed Source Layout

```
assets/agents/
├── common_best_practices.md
├── repo_development_best_practices.md
├── end_user_best_practices.md
└── compilation_design.md
```

## Compiled Outputs

Generated by the repository sync tooling:

- `AGENTS.md` = common + repo development sections.
- `CLAUDE.md` = repo-oriented pointer to `AGENTS.md`.
- `assets/lab/AGENTS.md` = common + end-user sections.
- `assets/lab/CLAUDE.md` = end-user pointer to workspace AGENTS files.
- `environments/AGENTS.md` and `assets/lab/environments/AGENTS.md` = built from `docs/environments.md`.

## Portability Note

Files under `assets/` are intended to be copied into lab workspaces.
They should not rely on repository-local script paths or suggest a fixed provenance location.

## Setup Integration

`verifiers/scripts/setup.py` should download AGENTS/CLAUDE files from `assets/lab/` so the setup path consumes end-user docs only.

## Initial Implementation Scope

- Add modular docs stubs for shared/repo/end-user guidance.
- Replace sync script with a compiler-style generator for all targets.
- Update setup downloader URLs to `assets/lab/...`.
- Keep generated root files for repo contributors.

## Future Extensions

- Add section-level metadata (ordering, include conditions).
- Add CI check to enforce generated files are up to date.
- Optionally generate additional assistant-specific pointers (e.g., Cursor/Gemini).
8 changes: 8 additions & 0 deletions assets/agents/end_user_best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## End-User Lab Workspace Notes

Use this guidance in projects created via `prime lab setup`.

- Use the documented workspace flow: `prime env init` → `prime env install` → `prime eval run`.
- Keep each environment self-contained under `environments/<env_name>/` with `pyproject.toml`, implementation, and README.
- Document required environment variables in README and validate missing keys early with `vf.ensure_keys(...)`.
- Use `prime env push --path ./environments/<env_name>` only after local eval behavior is verified.
8 changes: 8 additions & 0 deletions assets/agents/repo_development_best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Repository Development Notes

Use this guidance when contributing to the `verifiers` repository itself.

- Run the documented contributor checks for touched areas: `uv run ruff check --fix .`, `uv run pytest tests/`, and `uv run pre-commit run --all-files` as needed. (See `docs/development.md`.)
- Keep changes aligned with documented architecture (`verifiers/`, `environments/`, `configs/`, `tests/`, `docs/`) and update docs when behavior changes. (See `docs/development.md`.)
- Prefer a single clear path over maintaining parallel approaches by default; if two options exist, preserve both only when there is an explicit long-term reason.
- Aggressively deprecate/remove inferior paths when they are not part of an intended multi-option contract, especially in repo-internal development workflows.
23 changes: 23 additions & 0 deletions assets/lab/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# AGENTS.md

<!-- Generated for lab workspaces. -->

This AGENTS guide is intended for end users working in a `prime lab setup` workspace.

## Shared Best Practices (All Contexts)

These points are direct restatements of Verifiers docs so agents can follow the same golden-path workflows.

- Environments are expected to expose `load_environment(...) -> vf.Environment` and be installable with `prime env install <env-name>`. (See `docs/overview.md` and `docs/environments.md`.)
- Validate environment behavior with `prime eval run <env-name> ...` before sharing/publishing changes. (See `docs/overview.md` and `docs/development.md`.)
- Use `ToolEnv`/`MCPEnv` for stateless tools and `StatefulToolEnv` when per-rollout state must persist (sandbox/session/db handles). (See `docs/environments.md`.)
- If external API keys are required, validate them in `load_environment()` with `vf.ensure_keys(...)` so failures are explicit and early. (See `docs/environments.md`.)

## End-User Lab Workspace Notes

Use this guidance in projects created via `prime lab setup`.

- Use the documented workspace flow: `prime env init` → `prime env install` → `prime eval run`.
- Keep each environment self-contained under `environments/<env_name>/` with `pyproject.toml`, implementation, and README.
- Document required environment variables in README and validate missing keys early with `vf.ensure_keys(...)`.
- Use `prime env push --path ./environments/<env_name>` only after local eval behavior is verified.
7 changes: 7 additions & 0 deletions assets/lab/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# CLAUDE.md

<!-- Generated for lab workspaces. -->

Before beginning any task, read `AGENTS.md` and `environments/AGENTS.md` in this workspace.

Treat all `AGENTS.md` files as equivalent to `CLAUDE.md` files.
Loading
Loading