Skip to content

Conversation

@pankit-eng
Copy link
Contributor

This PR is to discuss OpenEnv 0.1 RFC with focus on

  1. Baseline API interface that an environment should provide
  2. Packaging and runtime interfaces - docker
  3. Communication interfaces - HTTP

What has been proposed here is already available on the master branch to try out and gather feedback from the current experience.

NOTE: Extensions to supporting observability, mcp tools will follow up this baseline API spec RFC in order to ensure

@pankit-eng pankit-eng requested review from Darktex and jspisak October 14, 2025 17:51
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025
@pankit-eng pankit-eng changed the title Add OpenEnv 0.1 RFC for excution environment Add OpenEnv 0.1 RFC for exceution environment Oct 14, 2025
updated the name to OpenEnv
@jspisak
Copy link
Contributor

jspisak commented Oct 15, 2025

will we link this to the top level readme to ensure folks see it?

@zkwentz zkwentz changed the title Add OpenEnv 0.1 RFC for exceution environment [RFC 001] - Add OpenEnv 0.1 RFC for execution environment Oct 15, 2025
@zkwentz zkwentz changed the title [RFC 001] - Add OpenEnv 0.1 RFC for execution environment [RFC 001] - Baseline API and Interface Specifications Oct 15, 2025
```
┌─────────────────────────────────────────────────────────┐
│ RL code(Client Application) │
│ RL code(Client Application) │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double line

│ │ (HTTPEnvClient)│ │ (HTTPEnvClient) │ │
│ └────────┬───────┘ └────────┬─────────┘ │
└───────────┼───────────────────────────────┼─────────────┘
│ HTTP (reset, step, state) │ HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should expose state as the model is that you keep that private and only return what you are allowed to see under the observation. If you are playing chess or having a 1:1 conversation, you are allowed to see everything so it doesn't matter. But it does matter in many real-life applications, which involved imperfect information (e.g. poker, you don't see other people's hands. But also a driving sim, where some cars will move out of your view because they are occluded by buildings or other cars)


@property
@abstractmethod
def state(self) -> State:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in python is really private, so idk how to enforce this

#### 1. Environment (Server-Side)

```python
class Environment(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a way of discovering actions (perhaps the topic of another RFC) it will have to backpropagate here

- Generic types (`Generic[ActT, ObsT]`) provide compile-time type safety
- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
- Example: `CodingEnv(HTTPEnvClient[CodeAction, CodeObservation])`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a better naming convention here. I know that CodingEnvClient is a bit heavy, but I find the current convention of naming the server CodingEnvironment and the client CodingEnv to be deceptive/confusing

"""Abstract base for container orchestration."""

@abstractmethod
def start_container(self, image: str, ...) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we call .reset() a lot, does it make sense to have a .reset() here too to like restart from a warmed-up image?

- **Flexibility**: Environments can use internal state and context not visible to clients for reward computation
- **Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()`

The `Observation` base class includes a `reward` field that environments populate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional reward field.


These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners.

**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call this in the very first line, so that the reader is not gonna be like "BUT WHAT ABOUT TOOLS"

metadata: Dict[str, Any] = field(default_factory=dict)
```

This design enables environments to compute rewards based on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Note that the environment is just the place where these are returned, not necessarily where they are computed. For example, we recommend that you RPC to a GPU machine hosting your reward model"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This brings the next question: what standard should said RPCs follow so that this code is shareable?)

@Darktex Darktex merged commit 9f03488 into main Oct 17, 2025
1 check passed
@Darktex
Copy link
Contributor

Darktex commented Oct 17, 2025

Merging this to move quicker. Will refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. RFC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants