feat: Add agent sandbox for isolated execution#121
Merged
steveyackey merged 18 commits intomainfrom Feb 5, 2026
Merged
Conversation
Hands-on testing of Anthropic's srt (sandbox-runtime) v1.0.0 on Linux (Fedora 43). Documents filesystem isolation, network control, process isolation, benchmarks, and multi-instance testing results. Key findings: - Filesystem write isolation works via bubblewrap mount namespace - Network filtering works via HTTP/SOCKS proxy with domain allow-lists - Process isolation strong: PID namespace, zero capabilities, seccomp - Startup overhead ~1.1s (Node.js), memory ~80-90MB per instance - 10 concurrent instances work within ~800MB additional memory - Read isolation is deny-list model (gap for multi-agent scenarios) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Requirements matrix covering all seven evaluation areas with benchmark data from hands-on srt testing: - Filesystem, network, and process isolation with evidence - Cross-platform support matrix (macOS, Linux, WSL2) - Rootless operation confirmed on all platforms - Multi-instance benchmarks (10 concurrent, ~80MB/instance) - Gap analysis with mitigation plan - Recommendation: srt with sandbox-per-agent architecture - All six PRD open questions answered with test data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…gies with hands-on testing
Implements the sandbox module at apps/cli/src/sandbox/ providing sandboxed process execution for agent isolation using Anthropic's srt (sandbox-runtime). Supports Linux (bubblewrap + socat) and macOS (sandbox-exec) platforms. - SandboxConfig type with network policy, filesystem mounts, deny-read paths - Config resolution merging defaults with user overrides - Executor wrapping spawn() with srt command prefixing - Platform backends for Linux and macOS with availability checks - Graceful fallback to unsandboxed execution when srt is unavailable - Sandbox disabled by default, activated via sandbox.enabled config - Unit tests for config resolution, platform detection, and command building Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add AgentSandboxConfig type to core.ts for per-agent sandbox settings
- Add SandboxConfigSchema to user config for YAML-based configuration
- Wire sandbox config through factory (CreateAgentOptions → createAgentByName → GenericAgentProvider)
- Replace raw child_process.spawn() with createSandboxedSpawn() in both
runStreaming() and runInteractive() methods
- Pass startingDirectory as the sandbox workspace mount path
- Sandbox is transparent to agents: disabled by default, gracefully
degrades when srt is not available
Per-agent sandbox config example in ~/.bloom/config.yaml:
agent:
claude:
sandbox:
enabled: true
networkPolicy: deny-all
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…h GenericAgentProvider
- Add SandboxManager class that manages per-agent sandbox instances with independent configuration, process tracking, and lifecycle - Each agent gets its own isolated sandbox with unique workspace path and spawn function that tracks child processes - Add cleanup handlers for SIGTERM, SIGINT, uncaught exceptions, and unhandled rejections to prevent orphaned sandbox processes - Graceful shutdown: SIGTERM to processes first, SIGKILL after 5s - Export SandboxManager and helpers from sandbox module public API - Add unit tests for SandboxManager (20 tests covering creation, retrieval, destruction, stats, concurrent instances, process tracking) - Add integration tests for concurrent sandboxed agents (18 tests): - Per-agent isolation verification (workspace, config, srt settings) - 5 and 10 concurrent agents running in parallel without degradation - Independent output streams across concurrent agents - Normal and abnormal exit cleanup (no orphaned processes) - destroyInstance/destroyAll kill running processes - Memory usage stays reasonable with 10 instances - Workspace file isolation between concurrent agents - Orchestrator-compatible sandbox config resolution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ent sandbox instances
Add comprehensive integration tests for the sandbox module: - Test-agent runs successfully inside sandbox manager spawn - Filesystem isolation: config denies reads to sensitive paths, limits writes - Network isolation: deny-all and allow-list policies validated via srt settings - Process isolation: SandboxManager tracks and terminates child processes - Graceful fallback: sandbox degrades correctly when srt is unavailable - Policy application: per-agent allow-list permits only specified hosts - E2E: test-agent with various sandbox configs (deny-all, allow-list, fs restrict) 37 new tests, all existing 821 tests pass with no regressions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ntegration tests and validate with test-agent
- Add sandbox setup guide covering macOS (Apple Silicon + Intel), Linux (x86_64), and Windows (WSL2) platforms with prerequisites, installation steps, and verification instructions - Add policy configuration reference documenting all sandbox options including networkPolicy modes, allowedDomains, writablePaths, and denyReadPaths with common policy patterns - Add troubleshooting guide with common issues, diagnostic steps, and platform-specific solutions - Update README.md with sandbox section including quick setup guide and links to detailed documentation - Update sidebars.ts to include new documentation pages Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement comprehensive sandbox lifecycle logging using Bloom's existing structured logger infrastructure: - Add SandboxStartEvent and SandboxStopEvent types for structured logging - Add PolicyViolation types for filesystem and network violations - Implement logSandboxStart() for start events (agent name, workspace, policy, srt version) - Implement logSandboxStop() for stop events (exit code, duration, killed status) - Implement logPolicyViolation() for blocked filesystem/network access - Add parseViolationsFromOutput() to parse srt stderr for violations - Add helper functions createStartEvent() and createStopEvent() - Add log level control: info for start/stop, warn for violations, debug for commands - Export all new types and functions from sandbox module index Log levels follow Bloom's patterns: - info: Normal lifecycle events (start/stop) - warn: Policy violations and abnormal exits - debug: Detailed command construction Includes 31 tests for logging behavior, all 852 CLI tests passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… wrapping Replace the CLI-wrapping approach (temp JSON files + `srt --settings <path>`) with direct library API calls via SandboxManager.initialize() and wrapWithSandbox(). - Delete platforms/ directory (linux.ts, macos.ts, index.ts) — replaced by library's built-in SandboxManager.isSupportedPlatform() and checkDependencies() - Rewrite executor.ts to lazy-import the library with graceful fallback - Replace toSrtSettings/SrtSettings with toSandboxRuntimeConfig/ SandboxRuntimeConfigCompat matching the library's config shape - Make SandboxedSpawnFn async (→ Promise<ChildProcess>) - Make SandboxManager.createInstance() async - Remove cleanupSandboxTempFiles() (no more temp files) - Add sandbox-runtime.d.ts for TypeScript when package not installed - Update generic-provider.ts for async spawn signatures - Update all test files for async API; delete platforms.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reflect the refactor from CLI-wrapping (srt --settings) to using @anthropic-ai/sandbox-runtime as a library API. Remove references to global srt CLI installation, temp settings files, and srt --version verification. Update installation instructions to note the library is an optional dependency loaded automatically. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The validate job had no timeout, defaulting to GitHub's 6-hour max. Add 15-minute timeout matching the expected test duration to prevent CI from hanging indefinitely on flaky process cleanup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a complete agent sandbox system for Bloom, enabling isolated execution of AI agents with filesystem, network, and process restrictions.
What was built
Sandbox module (
apps/cli/src/sandbox/) with:srt(Anthropic sandbox-runtime)GenericAgentProvider integration - Agents automatically use sandboxed execution when configured via:
Comprehensive test suite - 133 sandbox-specific tests covering:
Architecture decisions
Sandbox-per-agent model: Each agent gets its own isolated sandbox instance with independent configuration, enabling per-agent network policies and clean isolation between concurrent agents.
srt (Anthropic sandbox-runtime): Selected after evaluating bubblewrap, gVisor, and Firecracker. srt was chosen because:
Graceful fallback: When srt is not installed or the sandbox is not supported, agents run unsandboxed with a warning—no breaking changes to existing workflows.
Platform support
How to test
npm install -g @anthropic-ai/sandbox-runtimesudo apt-get install bubblewrap socatbloom agent checkto verify sandbox availabilitybloom runand observe sandbox start/stop logsKnown limitations
Documentation
Test plan
🤖 Generated with Claude Code