Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cedc25b
ruff precommit
filip-michalsky Jan 15, 2026
7280e65
smoke test env
filip-michalsky Jan 15, 2026
9cfd1c6
simplify smoke test
filip-michalsky Jan 15, 2026
c93bca2
delete datasets from wrong place
filip-michalsky Jan 16, 2026
8a978c5
restructure
filip-michalsky Jan 16, 2026
3d6d559
Remove gaia, mind2web, and webvoyager environment folders
filip-michalsky Jan 16, 2026
5db1d16
increment examples
filip-michalsky Jan 16, 2026
e3024b8
update readme and auto start cua server
filip-michalsky Jan 16, 2026
01eac47
add env check
filip-michalsky Jan 16, 2026
79e04d0
update readme
filip-michalsky Jan 16, 2026
0affb42
update agents md
filip-michalsky Jan 16, 2026
49f5b95
fix tests
filip-michalsky Jan 16, 2026
813eca4
update tests
filip-michalsky Jan 16, 2026
906a836
Remove gaia, webvoyager, mind2web from tracking
filip-michalsky Jan 16, 2026
e688da6
make bugbot happier
filip-michalsky Jan 16, 2026
64096e6
Fm/browser sandbox env (#3)
filip-michalsky Jan 24, 2026
278ae78
move cua server to assets
filip-michalsky Jan 24, 2026
3ef51bb
update readmes
filip-michalsky Jan 24, 2026
be0762f
Merge branch 'main' into fm/add-browser-env
filip-michalsky Jan 24, 2026
40f5b44
DRY modes
filip-michalsky Jan 24, 2026
38d8ae1
Merge branch 'fm/add-browser-env' of https://github.com/filip-michals…
filip-michalsky Jan 24, 2026
5dd0649
fix act in dom mode-small dict schema issue
filip-michalsky Jan 26, 2026
71899f5
update README to recommend max turns as 50 in examples
filip-michalsky Jan 27, 2026
310d8c9
update assets
filip-michalsky Jan 27, 2026
3a6a365
proxy bug
filip-michalsky Jan 28, 2026
ddf8fb4
remove duplicate code
filip-michalsky Jan 28, 2026
311164c
add advanced stealth flag
filip-michalsky Jan 28, 2026
9f4bb13
update logging
filip-michalsky Jan 28, 2026
732cb07
make bugbot happy
filip-michalsky Jan 28, 2026
9288b11
remove system prompts from browser_env
cdreetz Jan 29, 2026
d253e1d
remove references to sys prompts and tests for sys prpompts
cdreetz Jan 29, 2026
d96787d
add prompt to examples
filip-michalsky Jan 29, 2026
0373e01
local browser config only local
filip-michalsky Jan 29, 2026
907fddd
streamlining for call_tool fix + ty
willccbb Jan 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,7 @@ scratch/
.vscode/
*.swp
.DS_Store

# CUA server (local dev artifacts)
assets/templates/browserbase/cua/node_modules/
assets/templates/browserbase/cua/pnpm-lock.yaml
21 changes: 21 additions & 0 deletions assets/templates/browserbase/cua/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Ignore node_modules to ensure fresh install with correct architecture
node_modules

# Note: dist is NOT ignored because Dockerfile.runtime needs to copy the binary from it

# Ignore git
.git
.gitignore

# Ignore environment files
.env
.env.*

# Ignore IDE files
.vscode
.idea
*.swp
*.swo

# Ignore macOS files
.DS_Store
11 changes: 11 additions & 0 deletions assets/templates/browserbase/cua/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# CUA Server Environment Variables
# Copy this file to .env and fill in your values

# OpenAI API Key (required for Stagehand's model operations)
# since we are using stagehand just for its CDP management:
## the oai api key is not used at ALL but needs to be there due to Stagehand's typing signature
OPENAI_API_KEY=sk-your-openai-api-key-here

# Server Configuration (optional)
CUA_SERVER_PORT=3000
CUA_SERVER_HOST=0.0.0.0
35 changes: 35 additions & 0 deletions assets/templates/browserbase/cua/Dockerfile.build
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Dockerfile for building CUA server SEA binary (linux-x64)
#
# This creates a linux-x64 binary that can be deployed to sandboxes.
#
# Usage:
# docker build --platform linux/amd64 -f Dockerfile.build -t cua-builder .
# docker run --rm --platform linux/amd64 -v $(pwd)/dist:/output cua-builder
#
# Or use the npm script:
# pnpm build:binary:docker

FROM --platform=linux/amd64 node:22-slim

WORKDIR /app

# Install pnpm
RUN npm install -g pnpm

# Copy package files first for better caching
COPY package.json pnpm-lock.yaml* ./

# Install dependencies
RUN pnpm install --frozen-lockfile 2>/dev/null || pnpm install

# Copy source files
COPY . .

# Build the binary (output goes to /app/dist/sea/)
RUN pnpm build:binary

# Move built files to /build so volume mount doesn't shadow them
RUN mv dist /build

# Default command copies the binary to the mounted /output volume
CMD cp -r /build/sea /output/
34 changes: 34 additions & 0 deletions assets/templates/browserbase/cua/Dockerfile.runtime
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Dockerfile for pre-built CUA server runtime image
#
# This creates a minimal runtime image with the binary already included,
# eliminating the need to upload files and install dependencies at runtime.
#
# Build:
# docker build --platform linux/amd64 -f Dockerfile.runtime -t cua-server:local .
#
# Push to Docker Hub:
# docker tag cua-server:local deepdream19/cua-server:latest
# docker push deepdream19/cua-server:latest
#
# Note: Requires the binary to be built first via Dockerfile.build

FROM --platform=linux/amd64 node:22-slim

WORKDIR /app/cua-server

# Install curl for health checks
RUN apt-get update -qq && apt-get install -y -qq curl && rm -rf /var/lib/apt/lists/*

# Copy pre-built binary
COPY dist/sea/cua-server-linux-x64 ./cua-server-linux-x64
RUN chmod +x ./cua-server-linux-x64

# Default environment variables
ENV CUA_SERVER_PORT=3000
ENV CUA_SERVER_HOST=0.0.0.0

# Expose the server port
EXPOSE 3000

# Run the server
CMD ["./cua-server-linux-x64"]
274 changes: 274 additions & 0 deletions assets/templates/browserbase/cua/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
# CUA Primitives API Server

A Fastify server that exposes Stagehand's Computer Use Agent (CUA) browser primitives as REST endpoints, enabling external agents to control browser sessions remotely.

> **Note**: This server is automatically deployed to sandbox containers when using `BrowserEnv` with `mode="cua"` and `use_sandbox=True` (the default). You typically don't need to run this server manually unless you're doing local development.

## Automatic Sandbox Deployment

When using `BrowserEnv(mode="cua")`, the server is automatically:
1. Uploaded to a sandbox container
2. Started via `setup.sh`
3. Accessed via curl commands inside the sandbox
4. Cleaned up when the rollout completes

```python
# This automatically deploys the CUA server to a sandbox
env = BrowserEnv(
mode="cua",
dataset=dataset,
rubric=rubric,
)
```

## Manual Usage (Local Development)

For local development or debugging, you can run the server manually:

```bash
# Start the server (with hot reload)
pnpm dev

# Or start without hot reload
pnpm start

# Custom port via environment variable
CUA_SERVER_PORT=8080 pnpm dev
```

Then configure BrowserEnv to use the manual server:

```python
env = BrowserEnv(
mode="cua",
use_sandbox=False,
server_url="http://localhost:3000",
dataset=dataset,
rubric=rubric,
)
```

## Architecture

```
External Agent -> Fastify API -> BrowserSessionManager -> Stagehand Page -> Browser
```

## Prerequisites

```bash
npm install @browserbasehq/stagehand fastify
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CUA_SERVER_PORT` | `3000` | Server port |
| `CUA_SERVER_HOST` | `0.0.0.0` | Server host |

## API Endpoints

### Health Check

```bash
GET /health
```

Returns server status and active session count.

### List Sessions

```bash
GET /sessions
```

Returns array of active session IDs.

### Create Session

```bash
POST /sessions
Content-Type: application/json

{
"env": "LOCAL", // or "BROWSERBASE"
"viewport": {
"width": 1024,
"height": 768
}
}
```

Returns:
```json
{
"sessionId": "session_1234567890_abc123",
"state": {
"screenshot": "base64...",
"url": "about:blank",
"viewport": { "width": 1024, "height": 768 }
}
}
```

### Get Session State

```bash
GET /sessions/:id/state
```

Returns current browser state (screenshot, URL, viewport).

### Close Session

```bash
DELETE /sessions/:id
```

Closes the browser and removes the session.

### Execute Action

```bash
POST /sessions/:id/action
Content-Type: application/json

{
"type": "click",
"x": 100,
"y": 200
}
```

Returns:
```json
{
"success": true,
"state": {
"screenshot": "base64...",
"url": "https://example.com",
"viewport": { "width": 1024, "height": 768 }
}
}
```

## Available Actions

### Mouse Actions

| Action | Parameters | Description |
|--------|------------|-------------|
| `click` | `x`, `y`, `button?`, `clickCount?` | Click at coordinates |
| `double_click` | `x`, `y` | Double-click at coordinates |
| `tripleClick` | `x`, `y` | Triple-click at coordinates |
| `drag` | `path: [{x, y}, ...]` | Drag along path |
| `move` | - | No-op (cursor visualization) |

### Keyboard Actions

| Action | Parameters | Description |
|--------|------------|-------------|
| `type` | `text` | Type text into focused element |
| `keypress` | `keys` (string or array) | Press keyboard keys |

### Navigation Actions

| Action | Parameters | Description |
|--------|------------|-------------|
| `goto` | `url` | Navigate to URL |
| `back` | - | Go back in history |
| `forward` | - | Go forward in history |
| `scroll` | `x?`, `y?`, `scroll_x?`, `scroll_y?` | Scroll the page |

### Utility Actions

| Action | Parameters | Description |
|--------|------------|-------------|
| `wait` | `timeMs?` (default: 1000) | Wait for duration |
| `screenshot` | - | No-op (always returned in response) |

## Example Usage

```bash
# Create a session
SESSION=$(curl -s -X POST http://localhost:3000/sessions | jq -r '.sessionId')

# Navigate to a website
curl -X POST http://localhost:3000/sessions/$SESSION/action \
-H "Content-Type: application/json" \
-d '{"type": "goto", "url": "https://example.com"}'

# Click a button
curl -X POST http://localhost:3000/sessions/$SESSION/action \
-H "Content-Type: application/json" \
-d '{"type": "click", "x": 150, "y": 300}'

# Type into an input
curl -X POST http://localhost:3000/sessions/$SESSION/action \
-H "Content-Type: application/json" \
-d '{"type": "type", "text": "Hello, World!"}'

# Press Enter
curl -X POST http://localhost:3000/sessions/$SESSION/action \
-H "Content-Type: application/json" \
-d '{"type": "keypress", "keys": "Enter"}'

# Scroll down
curl -X POST http://localhost:3000/sessions/$SESSION/action \
-H "Content-Type: application/json" \
-d '{"type": "scroll", "x": 640, "y": 360, "scroll_y": 500}'

# Close the session
curl -X DELETE http://localhost:3000/sessions/$SESSION
```

## Response Format

All action responses include the full browser state:

```typescript
interface ActionResponse {
success: boolean;
error?: string;
state: {
screenshot: string; // base64 PNG
url: string;
viewport: {
width: number;
height: number;
};
};
}
```

## Error Handling

Errors return appropriate HTTP status codes:

- `404` - Session not found
- `500` - Action execution failed

```json
{
"error": "Session session_123 not found",
"code": "SESSION_NOT_FOUND"
}
```

## File Structure

```
cua-server/
├── index.ts # Entry point
├── server.ts # Fastify routes
├── sessionManager.ts # Browser session lifecycle
├── actionExecutor.ts # CUA primitive execution
├── stateCapture.ts # Screenshot & state helpers
├── types.ts # TypeScript types
├── setup.sh # Sandbox initialization script (used by CUASandboxMode)
├── package.json # Dependencies
├── tsconfig.json # TypeScript configuration
└── README.md # This file
```

Loading
Loading