Skip to content

Comments

RLM: re-enable Sandboxes for both Python and Bash#776

Merged
snimu merged 11 commits intomainfrom
sebastian/rlm-sandboxes-2026-01-23
Jan 24, 2026
Merged

RLM: re-enable Sandboxes for both Python and Bash#776
snimu merged 11 commits intomainfrom
sebastian/rlm-sandboxes-2026-01-23

Conversation

@snimu
Copy link
Contributor

@snimu snimu commented Jan 23, 2026

Description

Add sandbox backend for RLMEnv with tunnel routing, sandbox workers, and filesystem sync

This PR adds a sandbox execution backend to RLMEnv, enabling Python and bash REPLs to run inside Prime Sandboxes while keeping sub‑LLM calls and root tools local via a Prime Tunnel. It introduces sandbox lifecycle helpers, sandbox worker templates (no PTY/jail), deferred sandbox setup for envs that mutate the local staging FS before execution, and robust cleanup behavior. It also fixes bash root‑tool helper headers and ensures sandbox bash answers are captured correctly, plus syncs the sandbox filesystem back to local when retain_filesystem_after_rollout=True so reward functions can evaluate local state.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Enable sandbox execution for RLMEnv

  • New execution_backend option (local | sandbox) with SandboxRLMExecutor handling sandbox lifecycle, command exec, retries, and cleanup (via SandboxExecutorMixin)
  • Sandbox worker scripts for both bash and python (no PTY/jail), with proper root‑tool invocation and state handling; deferred setup until first code exec
  • Prime Tunnel routing for sub‑LLM/root‑tool interception when sandboxed (unless interception_url is provided); adds HTTP headers to root‑tool calls
  • Filesystem provisioning/sync: upload local context to sandbox, set remote paths, and optionally download back when retain_filesystem_after_rollout=True
  • Docs updated (docs/environments.md, experimental README) to describe sandbox mode and parameters; comprehensive tests added for backend selection, worker rendering, tunnel routing, cleanup, and FS provisioning

Written by Cursor Bugbot for commit 8b3304e. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@snimu snimu merged commit b1baa6c into main Jan 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant