Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 91 additions & 31 deletions docs/source/openenv.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Overview

[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying, and interacting with environments in reinforcement learning (RL) and agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example via HTTP or containerised execution). You can find a collection of ready-to-use OpenEnv environments on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).
[OpenEnv](https://github.com/meta-pytorch/OpenEnv) is an open-source framework from Meta's PyTorch team for defining, deploying, and interacting with environments in reinforcement learning (RL) and agentic workflows. It offers [Gymnasium-style APIs](https://gymnasium.farama.org) (e.g., `reset()` and `step()`) to interface with environments in a standard manner, and supports running these environments as backend servers (for example, via HTTP or containerised execution). You can find a collection of ready-to-use OpenEnv environments on the [Hugging Face Hub](https://huggingface.co/collections/openenv/environment-hub).

In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel free to explore the links above to dive deeper into OpenEnv itself.

Expand Down Expand Up @@ -67,30 +67,89 @@ By using OpenEnv in this loop, you can:

## Running the Environments

You can run OpenEnv environments in three different ways:

1. **Local Docker container** *(recommended)*

To start a Docker container:
* Open the environment on the Hugging Face Hub.
* Click the **⋮ (three dots)** menu.
* Select **“Run locally.”**
* Copy and execute the provided command in your terminal.

Example:
```bash
docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest
```
![open_env_launch_docker](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/open_env_launch_docker.png)
2. **Local Python process**: Launch the environment directly using Uvicorn.
You can start the server manually as a local process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).
```bash
python -m uvicorn envs.echo_env.server.app:app --host 0.0.0.0 --port 8001
```
3. **Hugging Face Spaces**: Connect to a hosted environment running on the Hugging Face Hub.
To find the connection URL, open the Space page, click the **⋮ (three dots)** menu, and select **“Embed this Space.”**
You can then use that URL to connect directly from your client.
Keep in mind that public Spaces may have rate limits or temporarily go offline if inactive.
You can run OpenEnv environments in three different ways:

- We can load the environment from the Hugging Face Hub and execute it as a Docker container.
- We can launch the environment directly using Uvicorn in Python, which you need on Google Colab.
- We can connect to a hosted environment running on the Hugging Face Hub.

<hfoptions id="env_mode">

<hfoption id="docker">

**Load from Hugging Face Hub** *(recommended)*

We can use the `from_hub` method to load the environment from the hub. This method will automatically start a Docker container for the environment on your local machine. `openenv/echo-env` is the repo_id of the space on the hub.

```python
env = EchoEnv.from_hub("openenv/echo-env")
```

If you want to launch the environment manually, you can use the following command to pull and run the Docker container:

```bash
docker run -d -p 8001:8000 --platform linux/amd64 registry.hf.space/openenv-echo-env:latest
```

And then you can connect to the environment using the following code:

```python
env = EchoEnv(base_url="http://0.0.0.0:8001")
```

Here, we map the ports from 8001 to 8000 to make space for a vLLM server, but you will need to manage the ports for your local machine.

> [!NOTE]
> You can find the Docker container for any space on the hub.
>
> * Open the space page on the hub.
> * Click the **⋮ (three dots)** menu.
> * Select **“Run locally.”**
> * Copy and execute the provided command in your terminal.
>
> ![open_env_launch_docker](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/open_env_launch_docker.png)

</hfoption>
<hfoption id="space">

**Connect to a remote Hugging Face Space**

You can connect to a hosted environment running on the Hugging Face Hub by passing the URL of the space to the `base_url` parameter of the environment class.

```python
env = EchoEnv(base_url="https://openenv-echo-env.hf.space")
```

> [!NOTE]
> You can find the connection URL of any space on the hub.
>
> * Open the space page on the hub.
> * Click the **⋮ (three dots)** menu.
> * Select **“Embed this Space.”**
> * Copy the connection URL.

</hfoption>

<hfoption id="local">

**Local Python process**

You can start the server manually as a local Python process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).

```bash
hf download openenv/echo_env --repo-type=space --local-dir=echo_env
python -m uvicorn echo_env.src.envs.echo_env.server.app:app --host 0.0.0.0 --port 8001
```

And then you can connect to the environment using the following code:

```python
env = EchoEnv(base_url="http://0.0.0.0:8001")
```

</hfoption>

</hfoptions>

## A simple example

Expand All @@ -101,7 +160,8 @@ from envs.echo_env import EchoEnv, EchoAction
from trl import GRPOConfig, GRPOTrainer

# Create HTTP client for Echo Environment
client = EchoEnv.from_docker_image("echo-env:latest")
client = EchoEnv.from_hub("openenv/echo-env")

"""
Alternatively, you can start the environment manually with Docker and connect to it:

Expand Down Expand Up @@ -225,17 +285,17 @@ We will use the `textarena` environment to train a model to play Wordle. The env
Wordle is a useful game to train a model on because it requires the model to reason about the word and the feedback provided by the environment. Also, it is a purely language based game that requires no external tools or knowledge. Furthermore, we found that models from 1 billion parameters and up are able to improve on wordle and only require 8 tokens to generate a guess, which makes the game a good benchmark to experiment with Reinforcement Learning environments without significant compute requirements.

> [!NOTE] How does Wordle work?
> Wordle is a word guessing game where the player has to guess a 5-letter word. The player can make 6 guesses, and for each guess, the environment will provide feedback on the correctness of the guess. The player wins if they guess the word in 6 guesses or less. It challenges the model to generate words that are likely to be correct, and to learn from the feedback provided by the environment.
> Wordle is a word guessing game where the player has to guess a 5-letter word. The player can make 6 guesses, and for each guess, the environment will provide feedback on the correctness of the guess. The player wins if they guess the word in 6 guesses or fewer. It challenges the model to generate words that are likely to be correct, and to learn from the feedback provided by the environment.
>
> For example, if the wordle environment returns the following feedback:
>
> ```
> G U E S S
> X G Y X X
> ```
> The model has guessed the word "GUESS" and the environment has provided feedback as the letters X, G, and Y. Referring to colors in the original game blank, green, and yellow. From this feedback, the model should learn that the word is "GUESS" is incorrect. The letter "E" is in the word, but in the wrong position. The letter "U" is correct and in the correct position.
> The model has guessed the word "GUESS" and the environment has provided feedback as the letters X, G, and Y. Referring to colors in the original game as blank, green, and yellow. From this feedback, the model should learn that the word "GUESS" is incorrect. The letter "E" is in the word, but in the wrong position. The letter "U" is correct and in the correct position.

In the TextArena environment, reward is only given when the model wins the game. The reward is 1.0 if the model wins, and 0.0 otherwise. This is not a very efficient reward signal for the model, so we have added a number of custom reward functions to the script to help the model learn to play the game. The extensible nature of `reward_funcs` and `rollout_func` allows you to add any custom reward function you want to the script.
In the TextArena environment, a reward is only given when the model wins the game. The reward is 1.0 if the model wins, and 0.0 otherwise. This is not a very efficient reward signal for the model, so we have added a number of custom reward functions to the script to help the model learn to play the game. The extensible nature of `reward_funcs` and `rollout_func` allows you to add any custom reward function you want to the script.

### Rollout Function

Expand Down Expand Up @@ -428,8 +488,8 @@ docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest

### Results

The resulting model improves it's performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters.
The resulting model improves its performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters.

<iframe src="https://burtenshaw-wordle-grpo.hf.space?project=group-Qwen-Qwen3-17B&metrics=reward&runs=run-2025-10-26_09-39-49,run-2025-10-26_08-04-49&sidebar=hidden&navbar=hidden" style="width:1600px; height:500px; border:0;"></iframe>

We experimented larger models like `gpt-oss-20b` and found that model was able to consistently win the game. However, this requires a lot of compute to train and the model. Why not try this out yourself?
We experimented with larger models like `gpt-oss-20b` and found that the model was able to consistently win the game. However, this requires a lot of compute to train the model. Why not try this out yourself?
Loading