-
Notifications
You must be signed in to change notification settings - Fork 504
openenv integration #829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
openenv integration #829
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
97a78c7
openenv integration
willccbb be2afb6
checkpoint
willccbb 22bd4d5
restructure project, simplify setup
willccbb 292a475
Clarify openenv docs and builds
willccbb a274f91
Tidy Environments Hub wording
willccbb 66dc1ec
Merge branch 'main' into will/openenv
willccbb ffa97f9
Fix regex for image parsing
willccbb 0a59aec
Fix ty unresolved-import warnings
willccbb 872b802
Fix ty unresolved-import warnings
willccbb 454b97c
Fix regex for TextArena env IDs
willccbb 6cbabf3
Handle health poll timeout
willccbb 4453b70
Investigate openenv-textarena health
willccbb 9f57d9b
Document OpenEnvEnv and cleanup
willccbb 07cdc76
Document OpenEnvEnv and fix cleanup
willccbb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| # THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY | ||
| version = 1 | ||
| name = "verifiers" | ||
|
|
||
| [setup] | ||
| script = "uv sync" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,7 @@ uv.lock | |
| .ropeproject/ | ||
| .scratch/ | ||
| .chroma_db/ | ||
| /.codex/environments/ | ||
|
|
||
| # artifacts | ||
| core.* | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # openenv-echo | ||
|
|
||
| <a href="https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/openenv_echo"> | ||
| <img src="https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white" alt="Source Code"> | ||
| </a> | ||
|
|
||
| ### Overview | ||
|
|
||
| - **Environment ID**: `openenv-echo` | ||
| - **Short description**: OpenEnv Echo environment via `OpenEnvEnv`, demonstrating MCP tool-calling in Prime Sandboxes. | ||
| - **Tags**: openenv, mcp, tools, example | ||
|
|
||
| ### Datasets | ||
|
|
||
| - **Primary dataset(s)**: Seed-generated episodes (one seed per rollout). | ||
| - **Source links**: Bundled OpenEnv Echo project in `proj/` (copied from OpenEnv). | ||
| - **Split sizes**: 100 train / 50 eval by default (configurable). | ||
|
|
||
| ### Task | ||
|
|
||
| - **Type**: Tool use, multi-turn. | ||
| - **Parser**: Default `Parser` (no special formatting). | ||
| - **Rubric overview**: `OpenEnvEpisodicSumRubric` sums per-step rewards; `MultiTurnMonitorRubric` tracks turn count. | ||
|
|
||
| ### Quickstart | ||
|
|
||
| Build and register the bundled OpenEnv Docker image in the Prime registry: | ||
|
|
||
| ```bash | ||
| uv run vf-build openenv-echo | ||
| ``` | ||
|
|
||
| This writes `environments/openenv_echo/proj/.build.json` with the fully qualified image reference and runtime metadata. | ||
|
|
||
| Verify the image is ready (status **Ready** or **Completed**): | ||
|
|
||
| ```bash | ||
| prime images list | ||
| ``` | ||
|
|
||
| Run an evaluation with default settings: | ||
|
|
||
| ```bash | ||
| prime eval run openenv-echo | ||
| ``` | ||
|
|
||
| Configure model and sampling: | ||
|
|
||
| ```bash | ||
| prime eval run openenv-echo \ | ||
| -m gpt-4.1-mini \ | ||
| -n 20 -r 3 -t 1024 -T 0.7 | ||
| ``` | ||
|
|
||
| Notes: | ||
| - If your environments directory is not `./environments`, run: | ||
| `uv run vf-build openenv-echo -p /path/to/environments` | ||
| - If you customize the bundled OpenEnv project, rerun `uv run vf-build openenv-echo` (the `proj/.build.json` manifest is updated). | ||
| - `openenv_echo.py` defines `render_echo_prompt()` and passes it via `prompt_renderer` | ||
| to keep the initial MCP prompt concise. | ||
|
|
||
| ### Troubleshooting | ||
|
|
||
| If you see errors like `waiting to start: trying and failing to pull image`, it means the image is not available to the sandbox. Common causes: | ||
| - The image build is still running or failed (`prime images list` should show **Ready** or **Completed**). | ||
| - The image reference in `proj/.build.json` is stale or invalid. | ||
| - The image is private or not accessible to your team. | ||
|
|
||
| If `prime images list` shows **Ready** but the sandbox still cannot pull the image, escalate to the platform team with: | ||
| - Image name/tag | ||
| - Build status/output from `prime images list` | ||
| - Sandbox ID and timestamp from the error log | ||
|
|
||
| ### Environment Arguments | ||
|
|
||
| | Arg | Type | Default | Description | | ||
| | --- | ---- | ------- | ----------- | | ||
| | `num_train_examples` | int | `100` | Number of training seeds to generate. | | ||
| | `num_eval_examples` | int | `50` | Number of eval seeds to generate. | | ||
| | `seed` | int | `0` | Base seed for episode generation. | | ||
|
|
||
| ### Metrics | ||
|
|
||
| | Metric | Meaning | | ||
| | ------ | ------- | | ||
| | `reward` | Sum of per-step rewards from the OpenEnv environment. | | ||
| | `num_turns` | Number of turns taken in the rollout. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| from pathlib import Path | ||
| from typing import Any, cast | ||
|
|
||
| import verifiers as vf | ||
| from verifiers.types import ChatMessages | ||
|
|
||
|
|
||
| def render_echo_prompt( | ||
| observation: Any, | ||
| *, | ||
| action_schema: dict[str, Any] | None = None, | ||
| context: str = "reset", | ||
| **kwargs: Any, | ||
| ) -> ChatMessages: | ||
| del kwargs | ||
| if not isinstance(observation, dict): | ||
| raise RuntimeError( | ||
| f"openenv-echo prompt renderer expected dict observation, got {type(observation).__name__}." | ||
| ) | ||
|
|
||
| messages = observation.get("messages") | ||
| if isinstance(messages, list) and messages: | ||
| return cast(ChatMessages, messages) | ||
|
|
||
| prompt = observation.get("prompt") | ||
| if isinstance(prompt, str) and prompt.strip(): | ||
| return cast(ChatMessages, [{"role": "user", "content": prompt}]) | ||
|
|
||
| if context == "reset" and isinstance(action_schema, dict): | ||
| return cast( | ||
| ChatMessages, | ||
| [ | ||
| { | ||
| "role": "user", | ||
| "content": ( | ||
| "You are connected to an OpenEnv MCP environment. " | ||
| "Call at least one tool before your final response. " | ||
| "Action contract: call_tool(tool_name: str, arguments: object)." | ||
| ), | ||
| } | ||
| ], | ||
| ) | ||
|
|
||
| raise RuntimeError("openenv-echo observation did not include a renderable prompt.") | ||
|
|
||
|
|
||
| def load_environment( | ||
| num_train_examples: int = 100, | ||
| num_eval_examples: int = 50, | ||
| seed: int = 0, | ||
| ): | ||
| return vf.OpenEnvEnv( | ||
| openenv_project=Path(__file__).parent / "proj", | ||
| num_train_examples=num_train_examples, | ||
| num_eval_examples=num_eval_examples, | ||
| seed=seed, | ||
| prompt_renderer=render_echo_prompt, | ||
| ) | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "app": "server.app:app", | ||
| "contract": "mcp", | ||
| "environment_id": "openenv-echo", | ||
| "image": "cmaeni8ji0001ql2z5gw8204f/openenv-echo:latest", | ||
| "image_status": "COMPLETED", | ||
| "port": 8000, | ||
| "schema_version": 1, | ||
| "start_command": "sh -lc \"cd /app/env && /app/.venv/bin/uvicorn server.app:app --host 0.0.0.0 --port 8000\"" | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.