- 
                Notifications
    You must be signed in to change notification settings 
- Fork 68
[RFC 001] - Baseline API and Interface Specifications #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
updated the name to OpenEnv
| will we link this to the top level readme to ensure folks see it? | 
adding a pytorch logo :)
Adding an experimental warning to the readme.
Creating a PR to update naming on the Readme
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
adding the CoC..
| ``` | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ RL code(Client Application) │ | ||
| │ RL code(Client Application) │ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double line
| │ │ (HTTPEnvClient)│ │ (HTTPEnvClient) │ │ | ||
| │ └────────┬───────┘ └────────┬─────────┘ │ | ||
| └───────────┼───────────────────────────────┼─────────────┘ | ||
| │ HTTP (reset, step, state) │ HTTP | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should expose state as the model is that you keep that private and only return what you are allowed to see under the observation. If you are playing chess or having a 1:1 conversation, you are allowed to see everything so it doesn't matter. But it does matter in many real-life applications, which involved imperfect information (e.g. poker, you don't see other people's hands. But also a driving sim, where some cars will move out of your view because they are occluded by buildings or other cars)
|  | ||
| @property | ||
| @abstractmethod | ||
| def state(self) -> State: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing in python is really private, so idk how to enforce this
| #### 1. Environment (Server-Side) | ||
|  | ||
| ```python | ||
| class Environment(ABC): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add a way of discovering actions (perhaps the topic of another RFC) it will have to backpropagate here
| - Generic types (`Generic[ActT, ObsT]`) provide compile-time type safety | ||
| - Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response. | ||
| - Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response. | ||
| - Example: `CodingEnv(HTTPEnvClient[CodeAction, CodeObservation])` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a better naming convention here. I know that CodingEnvClient is a bit heavy, but I find the current convention of naming the server CodingEnvironment and the client CodingEnv to be deceptive/confusing
| """Abstract base for container orchestration.""" | ||
|  | ||
| @abstractmethod | ||
| def start_container(self, image: str, ...) -> str: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we call .reset() a lot, does it make sense to have a .reset() here too to like restart from a warmed-up image?
| - **Flexibility**: Environments can use internal state and context not visible to clients for reward computation | ||
| - **Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()` | ||
|  | ||
| The `Observation` base class includes a `reward` field that environments populate: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional reward field.
|  | ||
| These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners. | ||
|  | ||
| **Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call this in the very first line, so that the reader is not gonna be like "BUT WHAT ABOUT TOOLS"
| metadata: Dict[str, Any] = field(default_factory=dict) | ||
| ``` | ||
|  | ||
| This design enables environments to compute rewards based on: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Note that the environment is just the place where these are returned, not necessarily where they are computed. For example, we recommend that you RPC to a GPU machine hosting your reward model"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This brings the next question: what standard should said RPCs follow so that this code is shareable?)
| Merging this to move quicker. Will refactor. | 
This PR is to discuss OpenEnv 0.1 RFC with focus on
What has been proposed here is already available on the master branch to try out and gather feedback from the current experience.
NOTE: Extensions to supporting observability, mcp tools will follow up this baseline API spec RFC in order to ensure