Skip to content

[RFC 003] Add MCP (Model Context Protocol) support - Phase 1#224

Merged
Darktex merged 8 commits into
mainfrom
rfc-003-first-pr
Jan 21, 2026
Merged

[RFC 003] Add MCP (Model Context Protocol) support - Phase 1#224
Darktex merged 8 commits into
mainfrom
rfc-003-first-pr

Conversation

@Darktex

@Darktex Darktex commented Nov 25, 2025

Copy link
Copy Markdown
Collaborator

Summary

This PR implements RFC 003: MCP (Model Context Protocol) support for OpenEnv - Phase 1.

  • Add strongly-typed MCP types: Tool, ToolError, ListToolsAction, CallToolAction, ListToolsObservation, CallToolObservation
  • Add MCPEnvironment base class for MCP-enabled environments with reserved tool name validation
  • Add WebSocket MCP message handler for JSON-RPC protocol (tools/list, tools/call)
  • Update echo_env as reference implementation with FastMCP tools

Key Design Decisions

  1. WebSocket-only transport: MCP uses type: "mcp" WebSocket messages (no HTTP /mcp endpoint)
  2. Per-environment MCP client: Each environment instance has its own MCP client
  3. Dual API support: step(ListToolsAction/CallToolAction) for training, WebSocket /mcp for production
  4. Reserved tool names: Hard error if tools named reset, step, state, or close are registered (protects dual API boundary)

Files Changed

Core MCP Infrastructure:

  • src/openenv/core/env_server/mcp_types.py - Tool, ToolError, actions/observations, WS message types
  • src/openenv/core/env_server/mcp_environment.py - MCPEnvironment base class
  • src/openenv/core/env_server/http_server.py - WebSocket MCP handler
  • src/openenv/core/env_server/__init__.py - Exports

Echo Environment Reference:

  • envs/echo_env/server/mcp_server.py - FastMCP tool definitions
  • envs/echo_env/server/echo_environment.py - MCP action support

Tests:

  • tests/core/test_mcp/test_mcp_types.py - 18 tests
  • tests/core/test_mcp/test_mcp_environment.py - 12 tests

Test plan

  • All 30 MCP tests pass: PYTHONPATH=src:envs pytest tests/core/test_mcp/ -v
  • MCP imports work correctly from package
  • Reserved tool names are validated

Review Comments Addressed

  • ✅ Renamed types.pymcp_types.py (Pankit's feedback)
  • ✅ Strongly typed Tool/ToolError classes (Pankit's feedback)
  • ✅ Reserved tool names cause hard error (alignment safeguard)
  • ✅ Added comprehensive tests
  • ✅ WebSocket-only transport (breaking change accommodation)

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 25, 2025
Comment thread rfcs/RFC-003-implementation-journal.md Outdated
@@ -0,0 +1,329 @@
# RFC-003 Implementation Journal

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written by Claude. Keeping this here while I implement, but I will remove it when the stack lands.

Comment thread src/envs/echo_env/client.py Outdated


class EchoEnv(HTTPEnvClient[EchoAction, EchoObservation]):
class EchoEnv(HTTPEnvClient[CallToolAction, Observation]):

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I followed the HTTPEnvClient convention, but to me this is the weakest part of OpenEnv as we require that people write boilerplate that we can generate for them.

I will tackle this when we go into supporting ToolCall and CodeAct at the same time, as that is a natural fit for generating this boilerplate on the fly.

Comment thread src/core/env_server/mcp_environment.py Outdated
Raises:
ValueError: If MCP client not configured or action type invalid
"""
from .types import CallToolObservation, ListToolsObservation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude this, but me no like. Will fix.

Comment thread src/core/env_server/http_server.py Outdated
observation = await loop.run_in_executor(
self._executor, self.env.step, action
)
# Handle MCP actions asynchronously (don't use thread pool for async operations)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankit-eng this is where I need your eyes the most. Does this make sense?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the core MCP (Model Context Protocol) infrastructure for OpenEnv as the first phase of RFC-003. It introduces a new MCPEnvironment base class that allows environments to expose tools via the MCP protocol, and converts the echo_env as a reference implementation. The changes enable both gym-style orchestration through HTTP endpoints and direct MCP JSON-RPC access for production agents.

Key Changes:

  • Added MCP action types (ListToolsAction, CallToolAction) and corresponding observations to the core type system
  • Implemented MCPEnvironment base class that handles MCP client/server communication boilerplate
  • Extended HTTP server with /mcp JSON-RPC endpoint for direct tool access alongside existing /step endpoint
  • Refactored echo_env to use MCP pattern, replacing custom EchoAction/EchoObservation with MCP tools

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
src/core/env_server/types.py Added MCP action and observation types (ListToolsAction, CallToolAction, ListToolsObservation, CallToolObservation)
src/core/env_server/mcp_environment.py New base class providing MCP infrastructure for environments
src/core/env_server/http_server.py Added /mcp JSON-RPC endpoint and MCP action routing in /step handler
src/core/env_server/__init__.py Exported new MCP types and MCPEnvironment class
src/core/pyproject.toml Added mcp and fastmcp dependencies
src/envs/echo_env/server/mcp_server.py New MCP tool definitions for echo environment
src/envs/echo_env/server/echo_environment.py Refactored to extend MCPEnvironment instead of custom implementation
src/envs/echo_env/server/app.py Updated to use base Action/Observation types for MCP support
src/envs/echo_env/models.py Removed custom EchoAction and EchoObservation (deprecated in favor of MCP)
src/envs/echo_env/client.py Updated to handle MCP actions and parse MCP observations
src/envs/echo_env/__init__.py Removed exports of deprecated custom action/observation types
tests/core/mcp/test_mcp.py New test file for MCP client/server integration
examples/test_mcp_integration.py Integration test script demonstrating MCP usage
examples/echo_mcp_demo.py Demo script showing MCP tool interaction
rfcs/RFC-003-implementation-journal.md Implementation journal documenting decisions and progress
Comments suppressed due to low confidence (1)

src/core/env_server/http_server.py:17

  • Import of 'json' is not used.
import json

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment thread src/core/env_server/http_server.py Outdated
Returns:
JSON-RPC 2.0 response
"""
if self.env.mcp_client is None:

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /mcp endpoint assumes self.env.mcp_client exists, but not all Environment instances will have this attribute. This will raise an AttributeError for non-MCP environments. Add a hasattr(self.env, 'mcp_client') check or check if the env is an instance of MCPEnvironment first.

Suggested change
if self.env.mcp_client is None:
if not hasattr(self.env, "mcp_client") or self.env.mcp_client is None:

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankit-eng this is good catch by copilot. Do you think we should just add that line?

Comment thread src/envs/echo_env/client.py Outdated
Comment on lines +53 to +58
>>> tools = client.list_tools()
>>> print(tools) # [{"name": "echo_message", ...}]
>>>
>>> # Call echo_message tool
>>> result = client.echo_message("Hello!")
>>> print(result["echoed_message"]) # "Hello!"

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring mentions list_tools() and echo_message() methods that are not implemented in the EchoEnv class. Either implement these convenience methods or update the documentation to reflect the actual API (using step() with ListToolsAction and CallToolAction).

Copilot uses AI. Check for mistakes.
Comment thread examples/echo_mcp_demo.py Outdated

# List available tools
print("2. Listing available tools...")
tools = client.list_tools()

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list_tools() method is not implemented in the EchoEnv client class (see client.py). This call will fail with an AttributeError. Either implement the method in the client or use the step() method with a ListToolsAction.

Copilot uses AI. Check for mistakes.
Comment thread examples/echo_mcp_demo.py Outdated

# Call echo_message tool using convenience method
print("3. Calling echo_message tool...")
result = client.echo_message("Hello from MCP!")

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The echo_message() method is not implemented in the EchoEnv client class (see client.py). This call will fail with an AttributeError. Either implement the method in the client or use the step() method with a CallToolAction.

Copilot uses AI. Check for mistakes.
Comment thread src/core/env_server/http_server.py Outdated
"id": None,
}

body = await request.json()

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /mcp endpoint lacks error handling for malformed JSON. If request.json() fails (e.g., invalid JSON in request body), it will raise an exception. Consider wrapping the await request.json() call in a try-except block and returning a JSON-RPC error with code -32700 ("Parse error") for invalid JSON.

Copilot uses AI. Check for mistakes.
Comment thread src/core/env_server/http_server.py Outdated
Comment on lines +243 to +244
return CallToolAction(
tool_name=action_data["tool_name"],

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing action_data["tool_name"] will raise a KeyError if the tool_name field is missing from the request. Use action_data.get("tool_name") with an appropriate error message if it's None, or validate that the field exists before accessing it.

Suggested change
return CallToolAction(
tool_name=action_data["tool_name"],
tool_name = action_data.get("tool_name")
if tool_name is None:
raise ValueError("Missing required field 'tool_name' for CallToolAction")
return CallToolAction(
tool_name=tool_name,

Copilot uses AI. Check for mistakes.
Comment thread src/core/env_server/http_server.py Outdated

try:
# Reuse MCP client from environment (avoids creating duplicate client)
async with self.env.mcp_client:

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP client is being used within an async with block in the /mcp endpoint (line 168) and also in _handle_mcp_action (mcp_environment.py line 133). This could lead to issues if the client doesn't support being entered multiple times concurrently. Consider whether the client should be managed at the environment level or if each endpoint should manage its own context.

Copilot uses AI. Check for mistakes.
Comment thread src/core/env_server/mcp_environment.py Outdated

self.mcp_server = mcp_server
self.mcp_client = Client(mcp_server)
super().__init__(mcp_client=self.mcp_client)

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Environment.__init__ base class doesn't accept an mcp_client parameter. This call will fail with a TypeError. The base class __init__ signature is def __init__(self, transform: Transform | None = None) (see interfaces.py line 95).

Either:

  1. Remove the mcp_client=self.mcp_client argument here, or
  2. Update the Environment.__init__ signature to accept mcp_client as an optional parameter
Suggested change
super().__init__(mcp_client=self.mcp_client)
super().__init__()

Copilot uses AI. Check for mistakes.
Comment thread src/envs/echo_env/client.py Outdated
Comment on lines +18 to +25
from core.env_server.types import (
CallToolAction,
CallToolObservation,
ListToolsAction,
ListToolsObservation,
Observation,
State,
)

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'ListToolsAction' is not used.

Copilot uses AI. Check for mistakes.
Comment thread src/envs/echo_env/client.py Outdated
Comment on lines +29 to +36
from openenv_core.env_server.types import (
CallToolAction,
CallToolObservation,
ListToolsAction,
ListToolsObservation,
Observation,
State,
)

Copilot AI Nov 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'ListToolsAction' is not used.

Copilot uses AI. Check for mistakes.
@Darktex

Darktex commented Nov 25, 2025

Copy link
Copy Markdown
Collaborator Author

@copilot I accept all your suggestions. Can you implement these changes?

Copilot AI commented Nov 25, 2025

Copy link
Copy Markdown

@Darktex I've opened a new pull request, #226, to work on those changes. Once the pull request is ready, I'll request review from you.

Comment thread src/core/env_server/types.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if we can rename the file name and not use types.py. It starts conflicting usually.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point

Comment thread examples/echo_mcp_demo.py Outdated

# Reset the environment
print("1. Resetting environment...")
result = client.reset()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is resetting a pre-req to list_tools call? Or in other words, does reset() semantically have any impact on the list_tools?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that list_tools() should be removed since instead we are just going to allow env.step(ListToolsAction()). Apart from that, hmmm I think that we may still want you to call .reset() first to initialize the env, in case some of your tools require state (e.g. a driving sim)

Comment thread src/core/env_server/types.py Outdated
Contains the list of available tools with their schemas.
"""

tools: List[Dict[str, Any]] = field(default_factory=list)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Darktex - can we use a strongly types schema(openAPI or similar) like how MCP list/tools does.
Each ToolSpec is defined as:

{
  name: string;          // Unique identifier for the tool
  description?: string;  // Human-readable description
  inputSchema: {         // JSON Schema for the tool's parameters
    type: "object",
    properties: { ... }  // Tool-specific parameters
  }
}

And this API returns a list of the above typed object:

[ToolSpec]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout

Comment thread src/core/env_server/types.py Outdated
metadata: Dict[str, Any] = field(default_factory=dict)


@dataclass(kw_only=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly recommend that we create typed objects for tools and use that in client as well as the server:

@dataclass
class Tool:
    name: str
    description: str
    input_schema: Dict[str, Any]
    output_schema: Dict[str, Any]

Comment thread src/core/env_server/types.py Outdated
Contains the result of calling a tool, or an error if the call failed.
"""

result: Optional[Any] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fields must be required for a CallToolObservation. Lets remove optional

We should be specific about error here: error coming from the tool execution needs to be part of the result object and not the error field. In other words, we should assume that tool will bundle the error in the Result and we will return the same as is in the result field.

For any other errors including but not limited to - invalid args, transport level issue, etc.., we can have an error field as we have here. Error field could also be more expanded:

class Error:
   error_type: enum
   error_str: description
   ....

Comment thread src/core/env_server/mcp_environment.py Outdated
from fastmcp import Client

self.mcp_server = mcp_server
self.mcp_client = Client(mcp_server)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that we will need to support multiple mcp servers and hence, multiple mcp clients in one environment. Are you planning on adding the routing support in a separate PR for the same?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankit-eng Correct! For now, let's do a single, local MCP server

@Darktex

Darktex commented Dec 1, 2025

Copy link
Copy Markdown
Collaborator Author

@copilot Can you send me a PR implementing Pankit's comments?

Copilot AI commented Dec 1, 2025

Copy link
Copy Markdown

@Darktex I've opened a new pull request, #230, to work on those changes. Once the pull request is ready, I'll request review from you.

@Darktex

Darktex commented Dec 1, 2025

Copy link
Copy Markdown
Collaborator Author

@copilot solve merge conflicts and rebase to Main

Copilot AI commented Dec 1, 2025

Copy link
Copy Markdown

@Darktex I've opened a new pull request, #231, to work on those changes. Once the pull request is ready, I'll request review from you.

@Darktex Darktex force-pushed the rfc-003-first-pr branch 2 times, most recently from c7072a2 to 7a74252 Compare December 5, 2025 19:25
Comment thread pyproject.toml
"if TYPE_CHECKING:",
]

[tool.pytest.ini_options]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude added this to fix pytest. This is the sort of "pytest hell" that I have encountered before and this looks like the sort of StackOverflow solution I would copypasta to fix things. Does this sound reasonable to people? @pankit-eng

@Darktex

Darktex commented Jan 13, 2026

Copy link
Copy Markdown
Collaborator Author

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.


Since I cannot access the PR directly via gh, let me analyze the PR based on the diff provided in the user's message. The PR implements RFC-003 MCP support. Let me create a comprehensive review based on the diff and the design documents I've read.

PR #224 Review: [RFC 003] [1/n] Implement basic local MCP server & add MCP client

Previous Review Summary

Key points from existing reviews:

  1. Copilot identified an unused import json in http_server.py:17
  2. Multiple reviewers (@Darktex, @pankit-eng) have commented, with @pankit-eng requesting changes
  3. Copilot created follow-up PRs (Address code review feedback for MCP implementation #226, Add strongly typed MCP types: Tool, ToolError, rename types.py to mcp_types.py #230, Merge main and rename MCP types to mcp_types.py #231) to address reviewer feedback
  4. Merge conflicts were identified that need resolution and rebase to main

Tier 1: Critical Issues (Bugs, Security, Lint)

🔴 BLOCKER: Lint Check Failed

The lint hook requires uv to be installed, which is missing from the environment. However, based on the code review, I can identify several issues:

🔴 BUG: Unused Import

File: src/core/env_server/http_server.py:17

import json

This import is never used in the file. Remove it.

Impact: Code cleanliness violation (already flagged by Copilot).


🟡 BUG: Missing Error Handling in MCP Endpoint

File: src/core/env_server/http_server.py:132-235

The /mcp endpoint has several error handling gaps:

  1. No timeout handling: MCP client calls (list_tools(), call_tool()) have no timeout, which could hang the server indefinitely.

  2. Context manager reuse issue: The code uses async with self.env.mcp_client inside the endpoint, but mcp_client may already be in use by another request. This could cause issues with concurrent requests.

# Line 169-170
async with self.env.mcp_client:
    if method == "tools/list":

Recommended fix:

  • Add timeout to MCP operations
  • Verify thread-safety of MCP client reuse
  • Add proper concurrent request handling

🟡 POTENTIAL BUG: Double Async Context Manager

File: src/core/env_server/mcp_environment.py:138-141

async with self.mcp_client:
    if isinstance(action, ListToolsAction):
        tools = await self.mcp_client.list_tools()

The MCPEnvironment uses async with self.mcp_client in _handle_mcp_action(), which is also called from the HTTP endpoint that may already have an active context. This could lead to nested context managers on the same client.

Verification needed: Check if FastMCP's Client supports nested async with contexts.


🟡 INCONSISTENT ERROR HANDLING

File: src/core/env_server/mcp_environment.py:154-167

The CallToolAction handler catches all exceptions and converts them to ToolError:

except Exception as e:
    return CallToolObservation(
        done=False,
        tool_name=action.tool_name,
        error=ToolError(
            error_type=ToolErrorType.EXECUTION_ERROR,
            message=str(e),
        ),
    )

However, ListToolsAction handler (lines 140-150) has NO exception handling. If list_tools() fails, it will crash the server.

Fix: Add try/except around list_tools() and return appropriate error observation.


🔴 MISSING FILE: models.py

File: src/envs/echo_env/models.py

The diff shows this file was deleted (deleted file mode 100644), but the client.py still has comments suggesting deprecated types exist:

# From client.py imports (lines 7-16)
try:
    from core.client_types import StepResult
    # ... no mention of models.py

However, the implementation journal mentions:

Deprecate src/envs/echo_env/models.py (EchoAction deprecated)

Was this file supposed to be kept with deprecation warnings, or fully deleted? The RFC journal says "deprecate" but the diff shows "delete".

VERIFICATION NEEDED: Confirm whether backward compatibility was intentionally broken or if deprecation warnings should have been kept.


Tier 2: Alignment Issues (Principles & Invariants)

⚠️ ALIGNMENT FLAG: Dual API Boundary Violation Risk

Invariant at risk: Dual API Boundary (INVARIANTS.md:41-58)

The concern: The /mcp endpoint exposes MCP tools directly to any HTTP client, which could potentially allow agents to access infrastructure-level operations if environment builders mistakenly expose reset() or simulation control as MCP tools.

From INVARIANTS.md:

Critical: The Gym-like API is NOT accessible to the agent being trained.
Violations to flag:

  • Exposing reset(), step(), or state() via MCP tools

Current risk: The architecture ASSUMES environment builders won't expose simulation control as tools, but there's no programmatic enforcement.

Recommended safeguards:

  1. Add a reserved keyword check that prevents tools named reset, step, state from being registered
  2. Document this requirement prominently in MCPEnvironment docstring
  3. Add runtime validation in MCPEnvironment.__init__() that scans tool names

Suggested reviewer: @Darktex (RFC author)


⚠️ POTENTIAL PRINCIPLE VIOLATION: Missing Type Safety

Principle: "Type safety with generics and Pydantic across the wire" (PRINCIPLES.md:16)

File: src/core/env_server/http_server.py:88-100

The /step endpoint now handles MCP actions with a runtime type check instead of leveraging the type system:

# Lines 103-105
if isinstance(action, (ListToolsAction, CallToolAction)):
    if not isinstance(self.env, MCPEnvironment):
        raise RuntimeError(...)

This bypasses the generic type system (Environment[ActT, ObsT]). The environment's action_cls parameter is ignored for MCP actions.

Issue: This violates the principle of using generics across the wire. MCP actions are handled via special-case runtime checks rather than being part of the action type union.

Alternative approach (more type-safe):

  • Define MCPAction = ListToolsAction | CallToolAction
  • Have environments declare: Environment[MCPAction | CustomAction, Observation]
  • Let the type system handle this instead of runtime isinstance checks

Impact: Medium - Works correctly but doesn't follow the repo's type safety principle.


⚠️ DESIGN CONCERN: Sync step() Method in Async-First Environment

File: src/core/env_server/mcp_environment.py:82-110

The MCPEnvironment.step() method is synchronous but calls async MCP operations via asyncio.run():

def step(self, action: Action) -> Observation:
    # ...
    return asyncio.run(self._handle_mcp_action(action))

Documentation says (line 89-92):

NOTE: This is a sync method that internally runs async MCP operations.
When called from sync context, it uses asyncio.run(). When called from
HTTPEnvServer (async context), HTTPEnvServer intercepts MCP actions and
calls _handle_mcp_action() directly...

Problem: This creates two code paths with different behavior:

  1. Direct env.step() calls → asyncio.run() (new event loop)
  2. HTTP /step endpoint → _handle_mcp_action() (existing event loop)

This violates the minimize lifecycle deltas principle (PRINCIPLES.md:7) - the same step() call behaves differently depending on how it's invoked.

Recommended approach:

  • Make Environment.step() async across the board (breaking change, needs RFC)
  • OR: Document this as a known limitation of MCP environments in the implementation journal

📋 MISSING DOCUMENTATION: No Migration Guide

From RFC-003 Implementation Journal (lines 119-120):

**PR #2: Environment Migration**
- Convert coding_env, browsergym_env, etc. to MCP pattern
- Document migration guide for environment authors

The migration guide is deferred to PR #2, but existing environment authors need guidance now on:

  1. How to convert existing environments to MCP
  2. What breaks vs. what's backward compatible
  3. When to use MCPEnvironment vs. keep existing Environment pattern

Impact: Low (deferred to future PR), but should be tracked.


🟡 INCONSISTENCY: Missing MCP Dependencies in echo_env Dockerfile

From Implementation Journal Checklist (line 103):

- [ ] Update echo_env Dockerfile dependencies

This is marked incomplete. The PR adds MCP dependencies to src/core/pyproject.toml but the echo_env Dockerfile may not have these dependencies.

Verification needed: Check if envs/echo_env/server/Dockerfile builds successfully with new MCP dependencies.


Tier 2: Code Quality & Best Practices

📝 INCOMPLETE TESTS

From Implementation Journal (lines 107-110):

#### Tests
- [ ] Create `tests/core/mcp/test_client.py`
- [ ] Create `tests/core/mcp/test_server.py`
- [ ] Update `tests/envs/test_echo_env.py`
- [ ] Verify other envs still work (no regressions)

All test items are incomplete. The PR includes example scripts but no pytest tests.

Impact: High - No automated testing for core MCP functionality. Manual testing only.


📝 DEBUG CODE IN EXAMPLES

File: examples/test_mcp_integration.py

Lines 10-60 contain extensive print statements:

print("=" * 60)
print("Testing MCP Integration")
print("=" * 60)
# ... many more print statements

This is acceptable for an example/demo script, but the file is named test_mcp_integration.py which suggests it should be a pytest test, not a manual script.

Recommendation: Either:

  1. Rename to demo_mcp_integration.py to clarify it's a manual demo
  2. OR: Convert to proper pytest test in tests/ directory

📝 MISSING DOCSTRINGS

Several new files lack module-level docstrings:

  • src/core/env_server/mcp_types.py - Has docstring ✅
  • src/envs/echo_env/server/mcp_server.py - Missing

Low priority but should be added for consistency.


Summary & Recommendations

Tier 1 Issues (Must Fix Before Merge)

  1. ✅ Remove unused import json in http_server.py
  2. ✅ Add exception handling for list_tools() in MCPEnvironment._handle_mcp_action()
  3. ⚠️ Verify MCP client context manager safety with concurrent requests
  4. ⚠️ Add timeout to MCP operations in /mcp endpoint
  5. ⚠️ Clarify models.py deletion vs deprecation strategy

Tier 2 Issues (Should Address)

  1. ALIGNMENT FLAG: Add safeguards against exposing simulation control (reset, step, state) as MCP tools
  2. Consider making type handling more consistent with generics instead of runtime isinstance checks
  3. Document the sync/async behavior difference between direct step() calls and HTTP endpoint calls
  4. Add pytest tests before merging (currently all test checklist items incomplete)
  5. Update echo_env Dockerfile with MCP dependencies
  6. Rename test_mcp_integration.pydemo_mcp_integration.py or convert to pytest

Deferred (Future PRs)

  1. Migration guide (PR Remove envtorch dir nesting and add base env code #2)
  2. Comprehensive environment migration (PR Remove envtorch dir nesting and add base env code #2)
  3. CodeAct integration (PR Add boiler plate code for CodingEnv #3)
  4. Performance optimization for local servers (PR Add BaseHTTPClient to talk to Envs via JSON over RPC. #4)

Overall Assessment

Architecture: ✅ Solid implementation of RFC-003 core infrastructure

Code Quality: ⚠️ Missing tests and some error handling gaps

Alignment: ⚠️ One critical alignment flag regarding dual API boundary enforcement

Recommendation: Request changes to address Tier 1 issues and the ALIGNMENT FLAG before merging. The architecture is sound, but the safeguards and error handling need strengthening.


Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@Darktex Darktex changed the title [RFC 003] [1/n] Implement basic local MCP server & add MCP client [RFC 003] Add MCP (Model Context Protocol) support - Phase 1 Jan 13, 2026
@Darktex Darktex requested a review from burtenshaw January 16, 2026 19:40
burtenshaw

This comment was marked as outdated.

@burtenshaw burtenshaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally looks great. I left some small nits mainly on consistncy. Other than that, I'd consider adding some tests. For example:

  • test the EchoEnvironment mcp faturs like list and call tools.
  • test passing a FastMCP in a test MCPEnvironment
  • test tools list and call via websockets

self._state = State(episode_id=str(uuid4()), step_count=0)
self._reset_count = 0
# Initialize MCP client for the local MCP server
self._mcp_client = Client(mcp)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# echo_environment.py line 77
self._mcp_client = Client(mcp)  # Private
Suggested change
self._mcp_client = Client(mcp)
self.mcp_client = Client(mcp)

return await self._mcp_client.list_tools()

# Run async operation synchronously
tools_result = asyncio.get_event_loop().run_until_complete(_list_tools())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tools_result = asyncio.get_event_loop().run_until_complete(_list_tools())
tools_result = asyncio.run(self._async_list_tools())


try:
# Run async operation synchronously
result = asyncio.get_event_loop().run_until_complete(_call_tool())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = asyncio.get_event_loop().run_until_complete(_call_tool())
result = asyncio.run(self._async_list_tools())


return CallToolObservation(
tool_name=action.tool_name,
result=tool_result,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice hear that result is a string of the text content but in MCPEnvironment it is a tool call response. Is the plan for this to be used flexibly, or should we be consistent?

*no opinion, just curious.

)
response = WSMCPResponse(data={
"jsonrpc": "2.0",
"result": result,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case result is the raw object which may not be JSON serializable.


try:
# Run the async call_tool with timeout
result = asyncio.run(

@burtenshaw burtenshaw Jan 16, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we gracefully handle this async call in case we're running in an async loop? I think we'll need this to work with the new async client.

For example, you could put the tool cool in a function like _call and do:

try:
    loop = asyncio.get_running_loop()
except RuntimeError:
    # no loop running
    loop = None

if loop is not None:
    # already in async context
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as pool:
        future = pool.submit(asyncio.run, _call())
        result = future.result()
else:
    # no async context
    result = asyncio.run(_call())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this. We might want a module core that handles our async operations because they're becoming important to the lib.

Implements RFC 003 for MCP integration in OpenEnv:

Core MCP Infrastructure:
- Add mcp_types.py with Tool, ToolError, ListToolsAction, CallToolAction,
  and corresponding observations
- Add MCPEnvironment base class with reserved tool name validation,
  timeout handling, and proper error categorization
- Add WebSocket MCP message handler for JSON-RPC via "type: mcp" messages

Echo Environment Reference Implementation:
- Add mcp_server.py with echo_message and echo_with_length tools
- Update echo_environment.py with MCP action support while maintaining
  backwards compatibility with EchoAction

Design decisions:
- WebSocket-only for MCP (no HTTP /mcp endpoint)
- Per-environment MCP client lifecycle
- Both step() and WebSocket /mcp paths supported
- reset() NOT required for list_tools()
- Reserved tool names (reset, step, state, close) raise hard error

Adds fastmcp>=2.0.0 dependency and comprehensive tests.
The directory name 'mcp' was shadowing the external mcp package
during pytest collection, causing 'No module named mcp.types' errors.
This commit addresses outstanding review comments from burtenshaw,
Copilot, and Claude's automated review on PR 224.

Changes:
- Add integration tests for EchoEnvironment MCP features (list/call tools)
- Add integration tests for MCPEnvironment with FastMCP servers
- Add WebSocket MCP tools/list and tools/call tests
- Fix EchoEnvironment to use asyncio.run() instead of deprecated get_event_loop()
- Fix WebSocket MCP handler to use async context manager for MCP client
- Fix MCPEnvironment tool name validation to check _tools attribute
- Add echo_mcp_demo.py example showing canonical SIM mode pattern
- Expose mcp_client as public attribute on EchoEnvironment for WebSocket support
The WebSocket tests use synchronous TestClient, not async/await,
so the asyncio marks were causing warnings.
…servations)

The function in prompts.py is named format_observations (plural) but
was incorrectly imported as format_observation (singular).
EchoEnvironment now properly inherits from MCPEnvironment instead of
duplicating MCP handling logic. This:
- Demonstrates the canonical way to build an MCPEnvironment subclass
- Removes code duplication between EchoEnvironment and MCPEnvironment
- Uses _step_impl() for domain-specific EchoAction handling

Also updated tests and demo to handle CallToolResult objects returned
by MCPEnvironment (accessing .data attribute for the actual value).
@Darktex Darktex merged commit 7f24d1f into main Jan 21, 2026
4 checks passed
@Darktex Darktex deleted the rfc-003-first-pr branch January 21, 2026 00:06
@Darktex

Darktex commented Jan 21, 2026

Copy link
Copy Markdown
Collaborator Author

Whops, merged this too soon. Created #312 to address your comments @burtenshaw

@greptile-apps greptile-apps Bot mentioned this pull request Feb 23, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants