-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Code Execution using Morph Cloud #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the nice feature @abetlen ! Overall the changes LGTM with some small suggestions to align the function args
To use Morph, first install the morphcloud package: | ||
|
||
```shell | ||
pip install morphcloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about adding this to the code
extra in setup.py
so users can just run uv pip install -e '.[code]'
?
if not all(v["language"] == language for v in verification_info): | ||
raise ValueError("All verification_info must have the same language", verification_info) | ||
|
||
if enforce_same_language: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice check!
src/open_r1/rewards.py
Outdated
@@ -398,19 +407,32 @@ def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, | |||
return output | |||
|
|||
|
|||
def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, **kwargs) -> list[float]: | |||
"""Reward function that evaluates code snippets using the E2B code interpreter. | |||
# This function has been removed as we now use a unified approach in code_reward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove these comments and code_reward_morph
since it doesn't seem to be used anywhere?
src/open_r1/rewards.py
Outdated
@@ -384,8 +386,15 @@ def extract_code(completion: str, language: str = "python") -> str: | |||
return extracted_answer | |||
|
|||
|
|||
def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, **kwargs) -> list[float]: | |||
rewards = code_reward(completions, num_parallel=num_parallel, e2b_router_url=e2b_router_url, **kwargs) | |||
def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise this is a breaking change, but maybe it's better to do this than have a router URL arg for E2B and then have users remember they need kwargs
for Morph:
def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]: | |
def binary_code_reward(completions, num_parallel: int = 2, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]: |
The alternative would be to expose
src/open_r1/rewards.py
Outdated
rewards = code_reward( | ||
completions, | ||
num_parallel=num_parallel, | ||
e2b_router_url=e2b_router_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here if you agree with the above change:
e2b_router_url=e2b_router_url, |
src/open_r1/rewards.py
Outdated
return code_reward(completions, num_parallel=num_parallel, provider_type="morph", **kwargs) | ||
|
||
|
||
def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]: | |
def code_reward(completions, num_parallel: int = 2, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]: |
src/open_r1/rewards.py
Outdated
execution_provider = get_provider( | ||
provider_type=provider_type, | ||
num_parallel=num_parallel, | ||
e2b_router_url=e2b_router_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should propagate kwargs
here to use Morph right?
e2b_router_url=e2b_router_url, | |
**kwargs, |
scripts/morph_router.py
Outdated
request_timeout (int): The maximum allowed time for the entire batch request in seconds. | ||
""" | ||
scripts: List[str] | ||
language: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One assumption I made when I wrote the E2B router was that all scripts were in the same language, should we relax this assumption here?
slurm/morph_router.slurm
Outdated
#SBATCH --error=logs/morph_router_%j.err | ||
|
||
# Load necessary modules | ||
module load anaconda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you were testing this with your own conda env, but for the PR can you copy paste the e2b_router.slurm
file and include your python scripts/morph_router.py --port 8001 --max_num_sandboxes 20
. This will only work on our cluster, but at least the slurm files will be consistent across the project.
src/open_r1/utils/routed_morph.py
Outdated
self.request_timeout = request_timeout | ||
|
||
def run_code(self, scripts: List[str], | ||
language: str = "python", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, perhaps we can relax the language here and have a list of languages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, we have some slow_tests for the code rewards / routers that are not part of the CI in https://github.com/huggingface/open-r1/blob/main/tests/slow/test_code_reward.py
If you could update / add an example there it would be appreciated so we can validate it all works on the cluster.
@@ -41,7 +41,7 @@ def __init__(self, router_url: str): | |||
def run_code( | |||
self, | |||
scripts: list[str], | |||
language: str = "python", | |||
languages: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i made the assumption that i should change the e2b router to support languages
as well to keep the API consistent between the two but i can also leave the old language
parameter for backwards compatibility if that's preferred
def test_python_code_reward(self): | ||
# requires E2B, see the README.md file | ||
code_dataset = load_dataset("open-r1/verifiable-coding-problems-python_decontaminated-tested") | ||
code_dataset = load_dataset("open-r1/verifiable-coding-problems-python_decontaminated-tested-shuffled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't find this dataset without the -shuffled
prefix so i added this
assert result.error is None | ||
assert "hello" in result.stdout or "4" in result.stdout | ||
assert "hello" in result.logs['stdout'][0] or "4" in result.logs['stdout'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also changed a bit of the handling here: maybe something has changed in the Execution object but i wasn't able to access the .exit_code
attribute, nor .stdout
without the changes i made here
Code Execution using Morph Cloud
Overview
This PR introduces Morph Cloud as an optional backend for code execution within Open-R1, providing users with an alternative provider for Codeforces as well as IOI code reward calculation.
Key Changes & Design
The integration supports two distinct execution pathways, chosen via configuration:
1. General Code Execution (
code_reward
)Mechanism: Integrated via
src/open_r1/utils/code_providers.py::MorphProvider
This provider supports two operational modes:
morph_router_url
is provided in the configuration, the provider uses a client (src/open_r1/utils/routed_morph.py::RoutedMorphSandbox
) to communicate with a separate router service (scripts/morph_router.py
). This pattern aims to improve performance and manage costs via sandbox reuse, similar to the E2B router.morph_router_url
is not provided,MorphProvider
falls back to interacting directly with the morphcloud SDK (MorphCloudClient
,Sandbox
). It manages concurrent sandbox creation and execution usingasyncio
up to the limit specified bynum_parallel
.Configuration: Enabled by setting
provider_type: “morph”
. Themorph_router_url
field is optional; omitting it selects Direct Mode (src/open_r1/configs.py
). A Slurm script (slurm/morph_router.slurm
) is provided for launching the router service if Router Mode is chosen.2. IOI Problem Execution (
ioi_code_reward
)Mechanism: Integrated via a custom client implementation (
src/open_r1/utils/ioi/morph_client.py::MorphCloudExecutionClient
). This client uses the lower-level APIs from the morphcloud library (morphcloud.api.MorphCloudClient
) to manage Morph snapshots and instances directly. It orchestrates the complex setup required for IOI problems (multi-file uploads, custom compile/run scripts) for fine-grained control. This pathway always interacts directly with Morph Cloud and does not use the router.Configuration: Enabled by setting
ioi_provider: “morph”
in the training configuration (src/open_r1/configs.py
).Supporting Changes
src/open_r1/rewards.py
to dispatch to the appropriate Morph provider/mode based on configuration settings.src/open_r1/configs.py
.README.md
and a dedicated technical overview inMORPH_INTEGRATION.md
.recipes/
.Dependencies & Usage
morphcloud
package (install viauv pip install morphcloud
). [Note: Verify addition to project dependencies, e.g., insetup.py
]MORPH_API_KEY
environment variable. This key is used by the Morph router service (if used) and the direct clients (MorphProvider
in Direct Mode,MorphCloudExecutionClient
for IOI).Backward Compatibility
code_reward
(provider_type=“e2b”
).ioi_code_reward
(ioi_provider=“piston”
).