Skip to content

Code Execution using Morph Cloud #614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

abetlen
Copy link

@abetlen abetlen commented Apr 17, 2025

Code Execution using Morph Cloud

Overview

This PR introduces Morph Cloud as an optional backend for code execution within Open-R1, providing users with an alternative provider for Codeforces as well as IOI code reward calculation.

Key Changes & Design

The integration supports two distinct execution pathways, chosen via configuration:

1. General Code Execution (code_reward)

Mechanism: Integrated via src/open_r1/utils/code_providers.py::MorphProvider
This provider supports two operational modes:

  • Router Mode: If morph_router_url is provided in the configuration, the provider uses a client (src/open_r1/utils/routed_morph.py::RoutedMorphSandbox) to communicate with a separate router service (scripts/morph_router.py). This pattern aims to improve performance and manage costs via sandbox reuse, similar to the E2B router.
  • Direct Mode: If morph_router_url is not provided, MorphProvider falls back to interacting directly with the morphcloud SDK (MorphCloudClient, Sandbox). It manages concurrent sandbox creation and execution using asyncio up to the limit specified by num_parallel.
    Configuration: Enabled by setting provider_type: “morph”. The morph_router_url field is optional; omitting it selects Direct Mode (src/open_r1/configs.py). A Slurm script (slurm/morph_router.slurm) is provided for launching the router service if Router Mode is chosen.

2. IOI Problem Execution (ioi_code_reward)

Mechanism: Integrated via a custom client implementation (src/open_r1/utils/ioi/morph_client.py::MorphCloudExecutionClient). This client uses the lower-level APIs from the morphcloud library (morphcloud.api.MorphCloudClient) to manage Morph snapshots and instances directly. It orchestrates the complex setup required for IOI problems (multi-file uploads, custom compile/run scripts) for fine-grained control. This pathway always interacts directly with Morph Cloud and does not use the router.
Configuration: Enabled by setting ioi_provider: “morph” in the training configuration (src/open_r1/configs.py).

Supporting Changes

  • Updated core reward logic in src/open_r1/rewards.py to dispatch to the appropriate Morph provider/mode based on configuration settings.
  • Added necessary configuration fields to dataclasses in src/open_r1/configs.py.
  • Included comprehensive setup instructions and usage guidance in README.md and a dedicated technical overview in MORPH_INTEGRATION.md.
  • Updated relevant example configuration files in recipes/.

Dependencies & Usage

  • Requires the morphcloud package (install via uv pip install morphcloud). [Note: Verify addition to project dependencies, e.g., in setup.py]
  • Users must obtain a Morph API key and set it as the MORPH_API_KEY environment variable. This key is used by the Morph router service (if used) and the direct clients (MorphProvider in Direct Mode, MorphCloudExecutionClient for IOI).
  • Activation is opt-in via configuration flags.

Backward Compatibility

  • This change is fully backward compatible.
  • E2B remains the default provider for code_reward (provider_type=“e2b”).
  • Piston remains the default provider for ioi_code_reward (ioi_provider=“piston”).
  • Existing workflows using the default providers are unaffected.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the nice feature @abetlen ! Overall the changes LGTM with some small suggestions to align the function args

To use Morph, first install the morphcloud package:

```shell
pip install morphcloud
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about adding this to the code extra in setup.py so users can just run uv pip install -e '.[code]'?

if not all(v["language"] == language for v in verification_info):
raise ValueError("All verification_info must have the same language", verification_info)

if enforce_same_language:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice check!

@@ -398,19 +407,32 @@ def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None,
return output


def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, **kwargs) -> list[float]:
"""Reward function that evaluates code snippets using the E2B code interpreter.
# This function has been removed as we now use a unified approach in code_reward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove these comments and code_reward_morph since it doesn't seem to be used anywhere?

@@ -384,8 +386,15 @@ def extract_code(completion: str, language: str = "python") -> str:
return extracted_answer


def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, **kwargs) -> list[float]:
rewards = code_reward(completions, num_parallel=num_parallel, e2b_router_url=e2b_router_url, **kwargs)
def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this is a breaking change, but maybe it's better to do this than have a router URL arg for E2B and then have users remember they need kwargs for Morph:

Suggested change
def binary_code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]:
def binary_code_reward(completions, num_parallel: int = 2, provider_type: str = None, enforce_same_language: bool = False, **kwargs) -> list[float]:

The alternative would be to expose

rewards = code_reward(
completions,
num_parallel=num_parallel,
e2b_router_url=e2b_router_url,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here if you agree with the above change:

Suggested change
e2b_router_url=e2b_router_url,

return code_reward(completions, num_parallel=num_parallel, provider_type="morph", **kwargs)


def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def code_reward(completions, num_parallel: int = 2, e2b_router_url=None, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]:
def code_reward(completions, num_parallel: int = 2, provider_type: str = "e2b", enforce_same_language: bool = False, **kwargs) -> list[float]:

execution_provider = get_provider(
provider_type=provider_type,
num_parallel=num_parallel,
e2b_router_url=e2b_router_url,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should propagate kwargs here to use Morph right?

Suggested change
e2b_router_url=e2b_router_url,
**kwargs,

@lewtun lewtun requested a review from edbeeching April 29, 2025 10:08
request_timeout (int): The maximum allowed time for the entire batch request in seconds.
"""
scripts: List[str]
language: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One assumption I made when I wrote the E2B router was that all scripts were in the same language, should we relax this assumption here?

#SBATCH --error=logs/morph_router_%j.err

# Load necessary modules
module load anaconda
Copy link
Collaborator

@edbeeching edbeeching Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you were testing this with your own conda env, but for the PR can you copy paste the e2b_router.slurm file and include your python scripts/morph_router.py --port 8001 --max_num_sandboxes 20. This will only work on our cluster, but at least the slurm files will be consistent across the project.

self.request_timeout = request_timeout

def run_code(self, scripts: List[str],
language: str = "python",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, perhaps we can relax the language here and have a list of languages?

Copy link
Collaborator

@edbeeching edbeeching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, we have some slow_tests for the code rewards / routers that are not part of the CI in https://github.com/huggingface/open-r1/blob/main/tests/slow/test_code_reward.py

If you could update / add an example there it would be appreciated so we can validate it all works on the cluster.

@@ -41,7 +41,7 @@ def __init__(self, router_url: str):
def run_code(
self,
scripts: list[str],
language: str = "python",
languages: Optional[List[str]] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made the assumption that i should change the e2b router to support languages as well to keep the API consistent between the two but i can also leave the old language parameter for backwards compatibility if that's preferred

def test_python_code_reward(self):
# requires E2B, see the README.md file
code_dataset = load_dataset("open-r1/verifiable-coding-problems-python_decontaminated-tested")
code_dataset = load_dataset("open-r1/verifiable-coding-problems-python_decontaminated-tested-shuffled")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't find this dataset without the -shuffled prefix so i added this

assert result.error is None
assert "hello" in result.stdout or "4" in result.stdout
assert "hello" in result.logs['stdout'][0] or "4" in result.logs['stdout'][0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changed a bit of the handling here: maybe something has changed in the Execution object but i wasn't able to access the .exit_code attribute, nor .stdout without the changes i made here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants