A containerized sandbox environment that enables AI agents to interact with terminal environments and web browsers programmatically.
The code is reconstructed from bytecode with Claude 3.7's help:
Manus Sandbox (this repo) is a container-based environment that provides a secure, isolated space for AI agents (particularly LLMs like Claude) to interact with terminal environments and web browsers. It acts as a bridge between the AI system and computing resources, allowing the AI to execute real-world tasks like:
- Running terminal commands
- Automating browser actions
- Managing files and directories
- Editing text files
This sandbox creates a controlled environment where AI systems can safely perform actions without having direct access to the host system.
┌───────────────────────────┐ ┌─────────────────┐ ┌────────────────────────────────────────────┐
│ │ │ │ │ Sandbox Container │
│ AI Agent (e.g. Claude) │ │ API Proxy │ │ │
│ │ │ │ │ ┌──────────┐ ┌─────────┐ ┌────────────┐ │
│ MANUS │ API Requests │ - Auth check │ │ │ │ │ │ │ │ │
│ │◄──────────────►│ - Rate limiting├─────►│ │ Terminal │ │ Browser │ │ File/Text │ │
│ │ & Responses │ - Routing │ │ │ Service │ │ Service │ │ Operations │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ └────┬─────┘ └────┬────┘ └─────┬──────┘ │
└───────────────────────────┘ └─────────────────┘ │ │ │ │ │
x-sandbox-token │ │ │ │ │
authentication │ v v v │
│ ┌──────────────────────────────────────┐ │
│ │ FastAPI │ │
│ │ (app/server.py + router.py) │ │
│ └──────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────┘
-
AI Agent: The LLM (e.g., Claude) that sends API requests to the sandbox to perform tasks.
-
API Proxy: An intermediary service (
https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi
) that:- Authenticates requests using the
x-sandbox-token
header - Routes requests to the appropriate sandbox instance
- Handles rate limiting and access control
- Authenticates requests using the
-
Sandbox Container: A Docker container that isolates the execution environment and provides:
- FastAPI server (
app/server.py
) - The main entry point for HTTP requests - WebSocket server (
app/terminal_socket_server.py
) - For real-time terminal interaction - File and text editing capabilities (
app/tools/text_editor.py
)
- FastAPI server (
-
browser_use Library: A modified version of the browser-use library that:
- Provides browser automation via Playwright
- Has been specifically adapted to work with Claude API (via
browser_use/agent/service.py
) - Handles browser actions, DOM interactions, and browser session management
The browser_use library is a key component of Manus Sandbox that enables browser automation. It provides a clean API for the AI to interact with web browsers programmatically.
It is MIT licensced although the liscence was missing from the original source code.
The Agent
class is the main entry point for browser automation. It handles:
- Initializing browser sessions
- Processing LLM outputs into actions
- Managing state history
- Handling errors and retries
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser: Browser | None = None,
# Many other parameters...
):
# Initialize all components
async def run(self, max_steps: int = 100) -> AgentHistoryList:
# Main execution loop
# Process LLM outputs and execute actions
The BrowserContext
class manages the browser state and provides methods for interacting with web pages:
class BrowserContext:
async def navigate_to(self, url: str):
"""Navigate to a URL"""
async def click_element(self, index: int):
"""Click an element using its index"""
async def input_text_to_element(self, index: int, text: str, delay: float = 0):
"""Input text into an element"""
The SystemPrompt
class defines the instructions given to the LLM about how to interact with the browser:
class SystemPrompt:
def important_rules(self) -> str:
"""
Returns the important rules for the agent.
"""
rules = """
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{
"current_state": {
"page_summary": "Quick detailed summary of new information from the current page which is not yet in the task history memory. Be specific with details which are important for the task. This is not on the meta level, but should be facts. If all the information is already in the task history memory, leave this empty.",
"evaluation_previous_goal": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not",
"memory": "Description of what has been done and what you need to remember. Be very specific. Count here ALWAYS how many times you have done something and how many remain. E.g. 0 out of 10 websites analyzed. Continue with abc and xyz",
"next_goal": "What needs to be done with the next actions"
},
"action": [
{
"one_action_name": {
// action-specific parameter
}
},
// ... more actions in sequence
]
}
"""
# More rules follow...
return rules
The prompt instructs the LLM on:
- How to format its responses (JSON structure)
- Rules for interacting with browser elements
- Navigation and error handling
- Task completion criteria
- Element interaction guidelines
The Registry
class provides a way to register and execute actions:
class Registry:
def action(
self,
description: str,
param_model: Optional[Type[BaseModel]] = None,
):
"""Decorator for registering actions"""
async def execute_action(
self,
action_name: str,
params: dict,
browser: Optional[BrowserContext] = None,
# Other parameters
) -> Any:
"""Execute a registered action"""
The communication between an AI agent (like Claude) and the sandbox follows this flow:
-
AI Agent Formulates a Request:
- The AI decides on an action to perform (e.g., run a terminal command, navigate a browser)
- It constructs an appropriate API request following the sandbox API specification
-
Request Transmission:
- The AI sends an HTTP request to either:
- Directly to the sandbox container (if exposed)
- Through an API proxy service (
https://api.manus.im/apiproxy.v1.ApiProxyService/CallApi
)
- The AI sends an HTTP request to either:
-
Authentication:
- The request includes an API token (
x-sandbox-token
header) - The token is verified against the value stored in
$HOME/.secrets/sandbox_api_token
- The request includes an API token (
-
Request Processing:
- The sandbox FastAPI server receives and processes the request
- It routes the request to the appropriate service (terminal, browser, file operations)
- The requested action is performed within the isolated container environment
-
Response Return:
- Results of the action are formatted as JSON or binary data (for file downloads)
- The response is sent back to the AI agent
-
Real-time Communication (for terminal):
- Terminal sessions use WebSockets for bidirectional, real-time communication
- The AI can receive terminal output as it's generated and send new commands
┌─────────────┐ ┌───────────────┐ ┌──────────────────┐
│ │ 1. HTTP Request │ │ 2. Route to │ │
│ AI Agent │────────────────►│ Sandbox API │─────────────►│ Terminal Service │
│ │ │ (FastAPI) │ │ │
│ │◄────────────────│ │◄─────────────│ │
└─────────────┘ 4. JSON Response└───────────────┘ 3. Execute └──────────────────┘
Command
The sandbox includes a Python API client (data_api.py
) that communicates with the proxy service:
from data_api import ApiClient
# Initialize the client
api_client = ApiClient()
# Call a terminal command
response = api_client.call_api(
"terminal_execute",
body={
"command": "ls -la",
"terminal_id": "main"
}
)
print(response)
When interacting with browser_use, the LLM (like Claude) must format its responses as JSON according to the schema defined in the system prompt:
{
"current_state": {
"page_summary": "Found search page with 10 results for 'electric cars'",
"evaluation_previous_goal": "Success - successfully navigated to search page and performed search as intended",
"memory": "Completed search for 'electric cars'. Need to extract information from first 3 results (0 of 3 done)",
"next_goal": "Extract detailed information from first search result"
},
"action": [
{
"click_element": {
"index": 12
}
}
]
}
This response structure allows the Agent to:
- Track the LLM's understanding of the current page
- Evaluate the success of previous actions
- Maintain memory across interactions
- Execute the next action(s)
The browser_use library provides a wide range of actions for web automation:
go_to_url
: Navigate to a specific URLsearch_google
: Perform a Google searchgo_back
: Navigate back in browser historyopen_tab
: Open a new browser tabswitch_tab
: Switch between browser tabs
click_element
: Click on a page element by its indexinput_text
: Type text into a form fieldscroll_down
/scroll_up
: Scroll the pagescroll_to_text
: Scroll to find specific textselect_dropdown_option
: Select from dropdown menus
extract_content
: Extract and process page contentget_dropdown_options
: Get all options from a dropdown
done
: Mark the task as complete and return results
To integrate an LLM with this sandbox:
-
API Client Implementation: Create an API client in the LLM's execution environment
-
Task Planning: The LLM should break down user requests into specific API calls
-
Sequential Operations: Complex tasks often require multiple API calls in sequence
-
Error Handling: The LLM should interpret error responses and adjust its approach
-
State Management: For multi-step operations, the LLM needs to track the state of the environment
-
User asks the LLM to "Create a Python script that fetches weather data and save it"
-
LLM plans the steps:
- Create a new Python file
- Write the code to fetch weather data
- Save the file
- Run the script to test it
- Show the results to the user
-
LLM executes each step by making API calls to the sandbox:
POST /text_editor
withcommand: "create"
to create a new filePOST /text_editor
withcommand: "write"
to write the codePOST /terminal/{id}/write
to run the scriptGET /terminal/{id}
to get the output- Return the results to the user
-
Multi-layered Authentication:
- API token authentication using the
x-sandbox-token
header (NOT IMPLEMENTED IN THIS CODE) - Token verification happens at the proxy layer before requests reach the FastAPI application (NOT IMPLEMENTED IN THIS CODE)
- Tokens are stored securely in
$HOME/.secrets/sandbox_api_token
- API token authentication using the
-
Proxy Service Protection:
- The proxy service provides an additional layer of security
- Acts as a gatekeeper for all requests to the sandbox
- Can implement rate limiting, request validation, and access control
-
Isolation:
- The Docker container provides isolation from the host system
- Prevents the AI from affecting the host machine directly
-
Resource Limitations:
- The sandbox can be configured with resource constraints (CPU, memory) at the Docker level
- Prevents resource exhaustion attacks
-
Action Restrictions:
- The API can be configured to restrict certain dangerous operations
- Browser automation is contained within the sandbox environment
The sandbox is designed to run in a Docker container. The provided Dockerfile was not in the original code but gives an idea of what the container could look like:
- A Python 3.11 environment
- Chromium browser for web automation
- All necessary dependencies
- API token initialization
To build and run the container:
# Build the container
docker build -t manus-sandbox .
# Run the container
docker run -p 8330:8330 manus-sandbox
This project is reconstructed from bytecode with Claude 3.7's help, and it demonstrates the advanced capabilities of container-based AI sandboxes. The browser_use component is a modified version of the open-source browser-use library.