PrimeIntellect-ai · willccbb · Feb 9, 2026 · Feb 8, 2026 · Feb 8, 2026
diff --git a/.cursor/BUGBOT.md b/.cursor/BUGBOT.md
@@ -24,3 +24,30 @@ Any PR that adds or removes an environment from the `environments/` folder must
 - Update the "What to look at for each pattern" section if applicable
 
 If an environment is added or removed without a corresponding `environments/README.md` update, request that the author add the necessary changes.
+
+## Skills Updates
+
+Any PR that changes user-facing Prime or Verifiers workflows for environment development, browsing, review, evaluation, GEPA optimization, or RL training must update the corresponding skills under `skills/`.
+
+This includes changes to command contracts, defaults, or behavior in:
+
+- `docs/overview.md`
+- `docs/environments.md`
+- `docs/evaluation.md`
+- `docs/training.md`
+- `docs/faqs.md`
+- `docs/prime_cli_verifiers_unification_design.md`
+- `verifiers/scripts/*.py`
+- `verifiers/cli/plugins/prime.py`
+
+When these files change, verify and update any affected skill files:
+
+- `skills/create-environments/SKILL.md`
+- `skills/browse-environments/SKILL.md`
+- `skills/review-environments/SKILL.md`
+- `skills/evaluate-environments/SKILL.md`
+- `skills/optimize-with-environments/SKILL.md`
+- `skills/train-with-environments/SKILL.md`
+- `skills/brainstorm/SKILL.md`
+
+If workflow-relevant changes are detected without matching skill updates, request that the author update the impacted skills before merge.
diff --git a/README.md b/README.md
@@ -73,8 +73,12 @@ prime lab setup
 This sets up a Python project if needed (with `uv init`), installs `verifiers` (with `uv add verifiers`), creates the recommended workspace structure, and downloads useful starter files:
 ```
 configs/
-├── endpoints.py        # OpenAI-compatible API endpoint configuration
-└── lab/                # Example configs for Hosted Training
+├── endpoints.toml      # OpenAI-compatible API endpoint configuration
+├── rl/                 # Example configs for Hosted Training
+├── eval/               # Example multi-environment eval configs
+└── gepa/               # Example configs for prompt optimization
+.prime/
+└── skills/             # Bundled workflow skills for create/browse/review/eval/GEPA/train/brainstorm
 environments/
 └── AGENTS.md           # Documentation for AI coding agents
 AGENTS.md               # Top-level documentation for AI coding agents
@@ -136,7 +140,7 @@ To run a local evaluation with any OpenAI-compatible model, do:
 ```bash
 prime eval run my-env -m gpt-5-nano # run and save eval results locally
 ```
-Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.py`.
+Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`.
 
 View local evaluation results in the terminal UI:
 ```bash

diff --git a/assets/agents/end_user_best_practices.md b/assets/agents/end_user_best_practices.md
@@ -2,6 +2,8 @@
 
 Use this guidance in projects created via `prime lab setup`.
 
+- Treat `.prime/skills/` as the canonical skill entrypoint in Lab workspaces. Use the bundled skills first for create/browse/review/eval/GEPA/train/brainstorm workflows before ad hoc approaches.
+- Keep endpoint aliases in `./configs/endpoints.toml` and use `endpoint_id`/model shortcuts in commands and configs.
 - Use the documented workspace flow: `prime env init` → `prime env install` → `prime eval run`.
 - Keep each environment self-contained under `environments/<env_name>/` with `pyproject.toml`, implementation, and README.
 - Document required environment variables in README and validate missing keys early with `vf.ensure_keys(...)`.

diff --git a/assets/lab/AGENTS.md b/assets/lab/AGENTS.md
@@ -17,6 +17,8 @@ These points are direct restatements of Verifiers docs so agents can follow the
 
 Use this guidance in projects created via `prime lab setup`.
 
+- Treat `.prime/skills/` as the canonical skill entrypoint in Lab workspaces. Use the bundled skills first for create/browse/review/eval/GEPA/train/brainstorm workflows before ad hoc approaches.
+- Keep endpoint aliases in `./configs/endpoints.toml` and use `endpoint_id`/model shortcuts in commands and configs.
 - Use the documented workspace flow: `prime env init` → `prime env install` → `prime eval run`.
 - Keep each environment self-contained under `environments/<env_name>/` with `pyproject.toml`, implementation, and README.
 - Document required environment variables in README and validate missing keys early with `vf.ensure_keys(...)`.

diff --git a/docs/evaluation.md b/docs/evaluation.md
@@ -66,34 +66,23 @@ prime eval run my-env -x '{"max_turns": 20}'
 | `--model` | `-m` | `openai/gpt-4.1-mini` | Model name or endpoint alias |
 | `--api-base-url` | `-b` | `https://api.pinference.ai/api/v1` | API base URL |
 | `--api-key-var` | `-k` | `PRIME_API_KEY` | Environment variable containing API key |
-| `--endpoints-path` | `-e` | `./configs/endpoints.toml` | Path to endpoints registry (`.toml` preferred, `.py` supported) |
+| `--endpoints-path` | `-e` | `./configs/endpoints.toml` | Path to TOML endpoints registry |
 | `--header` | — | — | Extra HTTP header (`Name: Value`), repeatable |
 
-For convenience, define model endpoints in `./configs/endpoints.toml` (or `./configs/endpoints.py`) to avoid repeating URL and key flags.
-
-```python
-ENDPOINTS = {
-    "gpt-4.1-mini": {
-        "model": "gpt-4.1-mini",
-        "url": "https://api.openai.com/v1",
-        "key": "OPENAI_API_KEY",
-    },
-    "qwen3-235b-i": {
-        "model": "qwen/qwen3-235b-a22b-instruct-2507",
-        "url": "https://api.pinference.ai/api/v1",
-        "key": "PRIME_API_KEY",
-    },
-}
-```
-
-Equivalent TOML format:
+For convenience, define model endpoints in `./configs/endpoints.toml` to avoid repeating URL and key flags.
 
 ```toml
 [[endpoint]]
 endpoint_id = "gpt-4.1-mini"
 model = "gpt-4.1-mini"
 url = "https://api.openai.com/v1"
 key = "OPENAI_API_KEY"
+
+[[endpoint]]
+endpoint_id = "qwen3-235b-i"
+model = "qwen/qwen3-235b-a22b-instruct-2507"
+url = "https://api.pinference.ai/api/v1"
+key = "PRIME_API_KEY"
 ```
 
 To define equivalent replicas, add multiple `[[endpoint]]` entries with the same `endpoint_id`.

diff --git a/docs/overview.md b/docs/overview.md
@@ -35,6 +35,8 @@ configs/
 ├── rl/                 # Example configs for Hosted Training
 ├── eval/               # Example multi-environment eval configs
 └── gepa/               # Example configs for prompt optimization
+.prime/
+└── skills/             # Bundled workflow skills for create/browse/review/eval/GEPA/train/brainstorm
 environments/
 └── AGENTS.md           # Documentation for AI coding agents
 AGENTS.md               # Top-level documentation for AI coding agents

diff --git a/docs/training.md b/docs/training.md
@@ -33,7 +33,7 @@ Use the `prime lab setup` script to download example configuration files for Hos
 prime lab setup
 ```
 
-This will download example TOML configs for Hosted Training into `configs/rl/`, example eval configs into `configs/eval/`, along with `endpoints.toml` and GEPA starter configs in `configs/gepa/`:
+This will download example TOML configs for Hosted Training into `configs/rl/`, example eval configs into `configs/eval/`, along with `configs/endpoints.toml` and GEPA starter configs in `configs/gepa/`:
 
 ```
 configs/

diff --git a/skills/brainstorm/SKILL.md b/skills/brainstorm/SKILL.md
@@ -0,0 +1,54 @@
+---
+name: brainstorm
+description: Run interactive brainstorming across verifiers environments, evaluations, GEPA, and RL training. Use when the user wants ideation, literature scanning, concept teaching, roadmap planning, or research program design grounded in local CLI sources, verifiers, and RL trainer code.
+---
+
+# Brainstorm
+
+## Goal
+Run structured, interactive ideation that turns ambiguous research goals into concrete environment and evaluation plans.
+
+## Interaction Style
+1. Drive an iterative conversation, not a one-shot dump.
+2. Ask focused clarifying questions before proposing large plans.
+3. Keep suggestions toolchain-native: CLI, verifiers, and RL trainer workflows.
+
+## Discovery Workflow
+1. Clarify objective, model family, budget, and timeline.
+2. Map objective to workflow levers:
+- environment creation or migration
+- benchmark/eval design
+- GEPA prompt optimization
+- RL training
+3. Build a short option set, then deepen only selected options.
+4. Nudge model-family intent explicitly:
+- Instruct-first exploration defaults: `gpt-4.1` series, `qwen3` instruct series.
+- Reasoning-first exploration defaults: `gpt-5` series, `qwen3` thinking series, `glm` series.
+- Recommend endpoint aliases in `configs/endpoints.toml` for repeatable experiments.
+
+## Required Grounding Sources
+1. Read local source before proposing workflows:
+- `~/dev/prime-cli`
+- `~/dev/prime-rl` (clone to `/tmp` only if needed)
+- current verifiers workspace docs/configs
+2. For literature and external eval ideas, browse web sources and prioritize mid-2025 onward unless the user asks otherwise.
+3. Include dates when discussing recent papers or benchmarks.
+
+## Concept Teaching Mode
+When asked to explain RL or environment concepts:
+1. Anchor explanations in prime-rl and verifiers terminology.
+2. Use concrete config and rollout examples.
+3. Distinguish binary-reward and continuous-reward training implications.
+
+## Planning Output Format
+Produce:
+1. Problem framing and assumptions.
+2. Candidate environment or eval ideas, ranked by expected value and implementation effort.
+3. Experiment plan with milestones, metrics, and go/no-go gates.
+4. Risks, dependencies, and required decisions from the user.
+5. Distribution plan for mature environments: recommend Hub push after smoke-test stability and ask whether visibility should be `PUBLIC` or `PRIVATE`.
+
+## Quality Guardrails
+1. Do not make hidden assumptions about benchmark prompt formatting or scoring contracts.
+2. Flag platform limitations clearly and pause for user direction when blocked.
+3. Prefer official first-party capabilities before suggesting custom third-party tooling.
diff --git a/skills/browse-environments/SKILL.md b/skills/browse-environments/SKILL.md
@@ -0,0 +1,67 @@
+---
+name: browse-environments
+description: Discover and inspect verifiers environments through the Prime ecosystem. Use when asked to find environments on the Hub, compare options, inspect metadata, check action status, pull local copies for inspection, or choose environment starting points before evaluation, training, or migration work.
+---
+
+# Browse Environments
+
+## Goal
+Use Prime ecosystem commands to discover environments quickly, inspect quality signals, and pick the right starting point.
+
+## Primary Discovery Workflow
+1. List candidate environments:
+```bash
+prime env list --search "math" --sort stars --show-actions
+```
+2. Narrow results with owner, tags, mine, or starred filters:
+```bash
+prime env list --owner primeintellect --tag tools --tag sandbox
+prime env list --mine
+prime env list --starred
+```
+3. Inspect details for shortlisted candidates:
+```bash
+prime env info owner/name
+prime env status owner/name
+```
+4. Pull source for deep inspection when needed:
+```bash
+prime env pull owner/name -t ./tmp-env
+```
+
+## Compare Candidates
+For each candidate, collect:
+1. Task type and horizon: single-turn, multi-turn, tool, sandbox.
+2. Reward type: binary, continuous, judge-based, mixed.
+3. Dependencies and secrets requirements.
+4. Latest action status and version signal.
+5. Fit to user goal: eval-only, GEPA, RL, or benchmark migration.
+
+## Endpoint And Model Selection Nudge
+1. Encourage users to configure endpoint aliases in `configs/endpoints.toml` before comparison evals.
+2. Ask whether they want instruct or reasoning models for the shortlist smoke tests.
+3. Instruct go-tos: `gpt-4.1` series, `qwen3` instruct series.
+4. Reasoning go-tos: `gpt-5` series, `qwen3` thinking series, `glm` series.
+
+## Prefer Official Ecosystem Paths
+1. Prefer Hub and Prime CLI workflows before manual third-party setup.
+2. Use install + smoke eval to validate real usability:
+```bash
+prime env install owner/name
+prime eval run name -m gpt-4.1-mini -n 5
+```
+3. For examples in the verifiers repository, use repo install path when available:
+```bash
+prime env install reverse-text --from-repo
+```
+
+## Anti-Patterns
+1. Do not recommend building from scratch if a strong ecosystem option exists.
+2. Do not rely on README claims without running at least one quick eval.
+3. Do not hide incompatibilities or missing dependencies.
+
+## Output Format
+Return:
+1. Ranked shortlist with one-line rationale per environment.
+2. Exact commands to install and run each shortlisted option.
+3. Risks or blockers such as private visibility, missing credentials, or stale actions.
diff --git a/skills/create-environments/SKILL.md b/skills/create-environments/SKILL.md
@@ -0,0 +1,107 @@
+---
+name: create-environments
+description: Create or migrate verifiers environments for the Prime Lab ecosystem. Use when asked to build a new environment from scratch, port an eval or benchmark from papers or other libraries, start from an environment on the Hub, or convert existing tasks into a package that exposes load_environment and installs cleanly with prime env install.
+---
+
+# Create Environments
+
+## Goal
+Build production-quality verifiers environments that work immediately in the Prime ecosystem: install, load, evaluate, and train without hidden setup.
+
+## Start With Ecosystem Paths
+1. Prefer ecosystem-native setup before custom scaffolding.
+2. Use this default loop:
+```bash
+prime env init my-env
+prime env install my-env
+prime eval run my-env -m gpt-4.1-mini -n 5
+```
+3. Prefer an existing environment as a starting point when possible:
+```bash
+prime env list --search "keyword"
+prime env info owner/name
+prime env install owner/name
+```
+4. For repository examples, use repo install when available:
+```bash
+prime env install math-python --from-repo
+```
+5. Encourage users to keep endpoint aliases in `configs/endpoints.toml` so smoke tests can switch models quickly.
+6. Ask users whether they want instruct or reasoning models for validation.
+7. Instruct-first smoke choices: `gpt-4.1` series, `qwen3` instruct series.
+8. Reasoning validation choices: `gpt-5` series, `qwen3` thinking series, `glm` series.
+
+## Build Modes
+
+### 1. Build From Scratch
+1. Define task contract first: prompt shape, allowed tools, stop conditions, rubric outputs, metrics.
+2. Select the smallest correct base class:
+- `SingleTurnEnv` for one-response tasks.
+- `MultiTurnEnv` for custom interaction loops.
+- `ToolEnv` or `MCPEnv` for stateless tools.
+- `StatefulToolEnv` for per-rollout resources.
+3. Implement `load_environment(...) -> vf.Environment` with explicit arguments.
+4. Add `pyproject.toml` defaults in `[tool.verifiers.eval]` only when stable.
+
+### 2. Port From Another Library, Project, or Paper
+1. Create a strict source-to-target mapping before coding:
+- dataset rows and splits
+- prompt rendering and role ordering
+- tool I/O schema and stop logic
+- scoring math and aggregation
+- pass/fail thresholds and special cases
+2. Preserve one-to-one logical equivalence for what the model sees and what gets scored.
+3. Never invent unresolved formatting decisions. Ask the user to decide explicitly.
+4. Benchmark runtime and remove avoidable bottlenecks before handoff.
+
+### 3. Start From Hub Environment
+1. Install or pull the closest baseline:
+```bash
+prime env install owner/name
+prime env pull owner/name -t ./tmp-env
+```
+2. Keep proven interfaces stable unless a migration is deliberate and explicit.
+3. Re-run smoke evals after each major change.
+
+## Non-Negotiable Quality Rules
+1. Use deterministic, well-defined reward checks or LLM judges.
+2. Avoid best-effort deterministic heuristics such as keyword style checks except as an explicit last resort with user sign-off.
+3. Make environments self-contained after install. Do not require users to run background servers before `load_environment()`.
+4. Manage external resources inside the environment lifecycle.
+5. Validate required secrets in `load_environment()` via `vf.ensure_keys(...)`.
+6. Surface feature limits directly. Do not ship hacky workarounds without explicit user approval.
+
+## Verification Gate
+Run these before claiming completion:
+```bash
+prime env install my-env
+prime eval run my-env -m gpt-4.1-mini -n 5
+prime eval run my-env -m gpt-4.1-mini -n 50 -r 1 -s
+```
+If multi-turn or tool-heavy, also run with higher rollouts:
+```bash
+prime eval run my-env -m gpt-4.1-mini -n 30 -r 3 -s
+```
+
+## Publish Gate Before Large Evals Or Training
+1. After smoke tests pass and behavior is stable, recommend pushing to Hub before large evals or RL training.
+2. Ask the user explicitly whether visibility should be `PUBLIC` or `PRIVATE`.
+3. Use:
+```bash
+prime env push --path ./environments/my_env --visibility PUBLIC
+```
+or
+```bash
+prime env push --path ./environments/my_env --visibility PRIVATE
+```
+4. For hosted or large-scale workflows, prefer running with the Hub slug after push:
+```bash
+prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s
+```
+
+## Deliverable Format
+Report:
+1. Environment ID and path.
+2. Exact install and eval commands used.
+3. Port-equivalence notes if migrated.
+4. Any unresolved user decisions that block strict fidelity.