feat: Add agent sandbox for isolated execution by steveyackey · Pull Request #121 · steveyackey/bloom

steveyackey · 2026-02-04T21:40:38Z

Summary

This PR adds a complete agent sandbox system for Bloom, enabling isolated execution of AI agents with filesystem, network, and process restrictions.

What was built

Sandbox module (apps/cli/src/sandbox/) with:
- Configuration resolution with sensible defaults (deny-all network, workspace-only writes, protected paths like ~/.ssh, ~/.aws, ~/.gnupg)
- Platform-specific backends for Linux (bubblewrap + socat) and macOS (sandbox-exec)
- Sandboxed spawn factory that wraps commands with srt (Anthropic sandbox-runtime)
- SandboxManager for multi-agent concurrent instances with lifecycle management
- Structured lifecycle logging with policy violation detection and reporting

GenericAgentProvider integration - Agents automatically use sandboxed execution when configured via:

agent:
  claude:
    sandbox:
      enabled: true
      networkPolicy: allow-list
      allowedDomains:
        - github.com
        - api.anthropic.com

Comprehensive test suite - 133 sandbox-specific tests covering:
- Config resolution and srt settings generation
- Platform detection and backend selection
- Multi-agent concurrent execution
- Process tracking and cleanup
- Lifecycle event logging and violation parsing

Architecture decisions

Sandbox-per-agent model: Each agent gets its own isolated sandbox instance with independent configuration, enabling per-agent network policies and clean isolation between concurrent agents.
srt (Anthropic sandbox-runtime): Selected after evaluating bubblewrap, gVisor, and Firecracker. srt was chosen because:
- Only cross-platform option (macOS + Linux from single codebase)
- Network domain filtering via proxy (essential for allowing specific APIs)
- Rootless operation on all platforms
- Built specifically for AI agent sandboxing
Graceful fallback: When srt is not installed or the sandbox is not supported, agents run unsandboxed with a warning—no breaking changes to existing workflows.

Platform support

Platform	Status	Technology
Linux	Fully supported	bubblewrap + socat
macOS	Fully supported	sandbox-exec (built-in)
WSL2	Fully supported	Same as Linux
Windows	Via WSL2 only	—

How to test

Install srt: npm install -g @anthropic-ai/sandbox-runtime
Install platform dependencies:
- Linux/WSL2: sudo apt-get install bubblewrap socat
- macOS: No additional dependencies needed

Enable sandbox in config:

# ~/.bloom/config.yaml
agent:
  claude:
    sandbox:
      enabled: true

Run bloom agent check to verify sandbox availability
Run bloom run and observe sandbox start/stop logs

Known limitations

Read isolation is deny-list based: The srt filesystem model allows reads by default; sensitive paths must be explicitly denied. Default config denies ~/.ssh, ~/.aws, ~/.gnupg.
~1.1s startup overhead: Due to Node.js runtime for srt. Negligible for long-running agents.
~80-90 MB memory per instance: Tested successfully with 10 concurrent agents on 8 GB system.

Documentation

Sandbox Setup Guide - Platform-specific installation
Sandbox Policy Reference - All configuration options
Troubleshooting Guide - Common issues and solutions
Technology Evaluation - Detailed research findings

Test plan

All 852 CLI tests pass
133 sandbox-specific tests pass
Web build succeeds
Docs build succeeds
Agent tests show no regressions
Sandbox works in fallback mode (unsandboxed) when srt not installed
Sandbox activates correctly when srt is available and enabled

🤖 Generated with Claude Code

Hands-on testing of Anthropic's srt (sandbox-runtime) v1.0.0 on Linux (Fedora 43). Documents filesystem isolation, network control, process isolation, benchmarks, and multi-instance testing results. Key findings: - Filesystem write isolation works via bubblewrap mount namespace - Network filtering works via HTTP/SOCKS proxy with domain allow-lists - Process isolation strong: PID namespace, zero capabilities, seccomp - Startup overhead ~1.1s (Node.js), memory ~80-90MB per instance - 10 concurrent instances work within ~800MB additional memory - Read isolation is deny-list model (gap for multi-agent scenarios) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Requirements matrix covering all seven evaluation areas with benchmark data from hands-on srt testing: - Filesystem, network, and process isolation with evidence - Cross-platform support matrix (macOS, Linux, WSL2) - Rootless operation confirmed on all platforms - Multi-instance benchmarks (10 concurrent, ~80MB/instance) - Gap analysis with mitigation plan - Recommendation: srt with sandbox-per-agent architecture - All six PRD open questions answered with test data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…gies with hands-on testing

Implements the sandbox module at apps/cli/src/sandbox/ providing sandboxed process execution for agent isolation using Anthropic's srt (sandbox-runtime). Supports Linux (bubblewrap + socat) and macOS (sandbox-exec) platforms. - SandboxConfig type with network policy, filesystem mounts, deny-read paths - Config resolution merging defaults with user overrides - Executor wrapping spawn() with srt command prefixing - Platform backends for Linux and macOS with availability checks - Graceful fallback to unsandboxed execution when srt is unavailable - Sandbox disabled by default, activated via sandbox.enabled config - Unit tests for config resolution, platform detection, and command building Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…with public API

- Add AgentSandboxConfig type to core.ts for per-agent sandbox settings - Add SandboxConfigSchema to user config for YAML-based configuration - Wire sandbox config through factory (CreateAgentOptions → createAgentByName → GenericAgentProvider) - Replace raw child_process.spawn() with createSandboxedSpawn() in both runStreaming() and runInteractive() methods - Pass startingDirectory as the sandbox workspace mount path - Sandbox is transparent to agents: disabled by default, gracefully degrades when srt is not available Per-agent sandbox config example in ~/.bloom/config.yaml: agent: claude: sandbox: enabled: true networkPolicy: deny-all Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…h GenericAgentProvider

- Add SandboxManager class that manages per-agent sandbox instances with independent configuration, process tracking, and lifecycle - Each agent gets its own isolated sandbox with unique workspace path and spawn function that tracks child processes - Add cleanup handlers for SIGTERM, SIGINT, uncaught exceptions, and unhandled rejections to prevent orphaned sandbox processes - Graceful shutdown: SIGTERM to processes first, SIGKILL after 5s - Export SandboxManager and helpers from sandbox module public API - Add unit tests for SandboxManager (20 tests covering creation, retrieval, destruction, stats, concurrent instances, process tracking) - Add integration tests for concurrent sandboxed agents (18 tests): - Per-agent isolation verification (workspace, config, srt settings) - 5 and 10 concurrent agents running in parallel without degradation - Independent output streams across concurrent agents - Normal and abnormal exit cleanup (no orphaned processes) - destroyInstance/destroyAll kill running processes - Memory usage stays reasonable with 10 instances - Workspace file isolation between concurrent agents - Orchestrator-compatible sandbox config resolution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ent sandbox instances

Add comprehensive integration tests for the sandbox module: - Test-agent runs successfully inside sandbox manager spawn - Filesystem isolation: config denies reads to sensitive paths, limits writes - Network isolation: deny-all and allow-list policies validated via srt settings - Process isolation: SandboxManager tracks and terminates child processes - Graceful fallback: sandbox degrades correctly when srt is unavailable - Policy application: per-agent allow-list permits only specified hosts - E2E: test-agent with various sandbox configs (deny-all, allow-list, fs restrict) 37 new tests, all existing 821 tests pass with no regressions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ntegration tests and validate with test-agent

- Add sandbox setup guide covering macOS (Apple Silicon + Intel), Linux (x86_64), and Windows (WSL2) platforms with prerequisites, installation steps, and verification instructions - Add policy configuration reference documenting all sandbox options including networkPolicy modes, allowedDomains, writablePaths, and denyReadPaths with common policy patterns - Add troubleshooting guide with common issues, diagnostic steps, and platform-specific solutions - Update README.md with sandbox section including quick setup guide and links to detailed documentation - Update sidebars.ts to include new documentation pages Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…up guides

Implement comprehensive sandbox lifecycle logging using Bloom's existing structured logger infrastructure: - Add SandboxStartEvent and SandboxStopEvent types for structured logging - Add PolicyViolation types for filesystem and network violations - Implement logSandboxStart() for start events (agent name, workspace, policy, srt version) - Implement logSandboxStop() for stop events (exit code, duration, killed status) - Implement logPolicyViolation() for blocked filesystem/network access - Add parseViolationsFromOutput() to parse srt stderr for violations - Add helper functions createStartEvent() and createStopEvent() - Add log level control: info for start/stop, warn for violations, debug for commands - Export all new types and functions from sandbox module index Log levels follow Bloom's patterns: - info: Normal lifecycle events (start/stop) - warn: Policy violations and abnormal exits - debug: Detailed command construction Includes 31 tests for logging behavior, all 852 CLI tests passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… wrapping Replace the CLI-wrapping approach (temp JSON files + `srt --settings <path>`) with direct library API calls via SandboxManager.initialize() and wrapWithSandbox(). - Delete platforms/ directory (linux.ts, macos.ts, index.ts) — replaced by library's built-in SandboxManager.isSupportedPlatform() and checkDependencies() - Rewrite executor.ts to lazy-import the library with graceful fallback - Replace toSrtSettings/SrtSettings with toSandboxRuntimeConfig/ SandboxRuntimeConfigCompat matching the library's config shape - Make SandboxedSpawnFn async (→ Promise<ChildProcess>) - Make SandboxManager.createInstance() async - Remove cleanupSandboxTempFiles() (no more temp files) - Add sandbox-runtime.d.ts for TypeScript when package not installed - Update generic-provider.ts for async spawn signatures - Update all test files for async API; delete platforms.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reflect the refactor from CLI-wrapping (srt --settings) to using @anthropic-ai/sandbox-runtime as a library API. Remove references to global srt CLI installation, temp settings files, and srt --version verification. Update installation instructions to note the library is an optional dependency loaded automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-02-05T04:00:13Z

Deploying bloom-docs with Cloudflare Pages

Latest commit:	`0df0950`
Status:	🚫 Build failed.

View logs

The validate job had no timeout, defaulting to GitHub's 6-hour max. Add 15-minute timeout matching the expected test duration to prevent CI from hanging indefinitely on flaky process cleanup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

steveyackey and others added 17 commits February 4, 2026 12:41

Merge feature/agent-sandbox/tech-survey: Evaluate sandboxing technolo…

b12cd6b

…gies with hands-on testing

Merge feature/agent-sandbox/sandbox-module: Build the sandbox module …

38b248d

…with public API

Merge feature/agent-sandbox/integrate-provider: Integrate sandbox wit…

1333eea

…h GenericAgentProvider

Merge feature/agent-sandbox/multi-agent: Validate multi-agent concurr…

c48b7a7

…ent sandbox instances

Merge feature/agent-sandbox/integration-tests: Add sandbox-specific i…

52c5cc8

…ntegration tests and validate with test-agent

Merge feature/agent-sandbox/docs: Write sandbox documentation and set…

6b14857

…up guides

Merge feature/agent-sandbox/logging: Implement sandbox lifecycle logging

6724304

steveyackey merged commit 1caba03 into main Feb 5, 2026
4 checks passed

steveyackey deleted the integrate/agent-sandbox branch February 5, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Add agent sandbox for isolated execution#121

feat: Add agent sandbox for isolated execution#121
steveyackey merged 18 commits intomainfrom
integrate/agent-sandbox

steveyackey commented Feb 4, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

steveyackey commented Feb 4, 2026

Summary

What was built

Architecture decisions

Platform support

How to test

Known limitations

Documentation

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Feb 5, 2026

Deploying bloom-docs with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant