Delegate 1 is a revolutionary single-threaded, single-session, multi-channel AI assistant that provides seamless conversational experiences across multiple communication channels! Unlike traditional AI assistants that handle each interaction in isolation, Delegate 1 maintains a unified conversation thread that spans across different input and output modalities.
The core purpose of Delegate 1 is to create a truly integrated AI assistant that can:
- Maintain Context Across Channels: Continue conversations seamlessly whether you're interacting via text, voice, or phone calls
- Single Session Management: All interactions are managed within a single, persistent session thread, ensuring conversation continuity and context preservation
- Multi-Modal Communication: Support for text-based chat, real-time voice conversations, and traditional phone calls via Twilio integration
- Real-Time Responsiveness: Leverage OpenAI's Realtime API for low-latency, natural conversational experiences
Delegate 1 employs a backend-centric architecture that centralizes session management and conversation state. This design enables:
- All communication channels connect to a single, unified session object
- Conversation history and context are maintained across channel switches
- Real-time event streaming for observability and monitoring
The system supports multiple communication channels:
- Text Channel: Traditional text-based chat interface
- Voice Channel: Real-time voice conversations using WebRTC
- Phone Channel: Traditional phone calls via Twilio integration
- API Channel: Programmatic access for external integrations
- OpenAI Realtime API: Core conversational AI capabilities
- Next.js + TypeScript: Frontend web application
- Express.js: Backend server for session management
- WebSocket: Real-time communication between frontend and backend
- Twilio: Voice calling infrastructure
- OpenAI Agents SDK: Agent orchestration and handoff capabilities
This project builds upon two key reference implementations:
- OpenAI Realtime Agents: Provides the foundation for multi-modal agent interactions with text and voice capabilities
- Twilio Demo: Serves as the primary architectural base, offering a backend-centric, single-session implementation pattern that perfectly aligns with Delegate 1's requirements
The Twilio demo's architecture is particularly valuable as it already demonstrates:
- Centralized session management on the backend
- Multi-connection coordination (Twilio ↔ OpenAI ↔ Frontend)
- Real-time event streaming for observability
- Single session object managing multiple connection types
- Conversation Continuity: Switch between text, voice, and phone seamlessly without losing context
- Unified Experience: One AI assistant that remembers your entire interaction history
- Real-Time Performance: Low-latency responses across all communication channels
- Scalable Architecture: Backend-centric design supports multiple concurrent sessions
- Extensible Design: Easy to add new communication channels or integrate with external systems
Delegate 1 is designed for scenarios where users need:
- Continuous assistance across different communication preferences
- Context-aware interactions that span multiple sessions
- Professional-grade AI assistance with phone call capabilities
- Real-time collaboration with voice and text integration
- Seamless handoffs between different interaction modalities
- Node.js (v18 or higher)
- npm or yarn
- OpenAI API key
- Twilio account (for phone call functionality)
The easiest way to get Delegate 1 running is to use our unified startup scripts:
# Clone the repository
git clone <repository-url>
cd delegate1
# Install all dependencies
npm run install:all
# Start both frontend and backend servers
./start.sh# Install dependencies for the root project
npm install
# Install dependencies for both frontend and backend
npm run install:all
# Start both servers in development mode
npm run dev# Terminal 1 - Backend (websocket-server)
cd websocket-server
npm install
npm run dev
# Terminal 2 - Frontend (webapp)
cd webapp
npm install
npm run dev-
Copy the environment files:
cp websocket-server/.env.example websocket-server/.env cp webapp/.env.example webapp/.env cp voice-client/.env.example voice-client/.env
-
Configure your environment variables:
- OpenAI API key: Required for all AI functionality
- Twilio credentials: Required for phone call functionality
- Public URL: Required for Twilio webhook integration (see Twilio Setup below)
To enable phone call functionality with Twilio, follow these steps:
ngrok is required to make your local server accessible to Twilio webhooks:
# Install ngrok globally (already done if you followed Quick Start)
npm install -g ngrok
# Start your websocket-server first
npm run backend:dev
# In a new terminal, expose your server via ngrok
ngrok http 8081ngrok will provide you with a public URL like: https://abc123.ngrok.io
Update your websocket-server/.env file:
# Your ngrok URL (without trailing slash)
PUBLIC_URL=https://abc123.ngrok.io
# Your OpenAI API key
OPENAI_API_KEY=your_openai_api_key_here
# Twilio credentials (optional for basic testing)
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
# Optional default number to send SMS to when none is detected
TWILIO_SMS_DEFAULT_TO=+15555555555- Go to Twilio Console
- Navigate to Phone Numbers → Manage → Active numbers
- Select your Twilio phone number
- Set the webhook URL to:
https://your-ngrok-url.ngrok.io/twiml - Set HTTP method to POST
- Save the configuration
When ngrok gives you a new URL, you can auto-update the TwiML app using the following command:
npm run script:update-appThis script reads TWILIO_TWIML_APP_SID and PUBLIC_URL from websocket-server/.env and sets the TwiML App Voice URL to ${PUBLIC_URL}/twiml (AU1 region, edge: sydney).
Optional custom env path:
node scripts/twilio/update-twiml-app.js --env path/to/.envDelegate 1 includes a collection of utility scripts for managing Twilio integration and debugging. All scripts are organized in the /scripts/ directory:
# Generate fresh Twilio access token
npm run script:token
# List all TwiML Applications
npm run script:list-apps
# Inspect current TwiML Application configuration
npm run script:inspect-app
# Debug token issues
npm run script:validate-token
npm run script:test-api-keyWhen ngrok gives you a new URL, update PUBLIC_URL in websocket-server/.env, then run:
npm run script:update-appThe script reads TWILIO_TWIML_APP_SID and PUBLIC_URL from websocket-server/.env and sets the TwiML App Voice URL to ${PUBLIC_URL}/twiml (AU1 region, edge: sydney).
Optional custom env path:
node scripts/twilio/update-twiml-app.js --env path/to/.env/scripts/twilio/- TwiML Application and token management/scripts/debug/- Debugging and testing utilities
For detailed documentation, see scripts/README.md.
-
Start all services:
npm run dev
-
Make a test call:
- Call your Twilio phone number
- The call should connect to your Delegate 1 backend
- Monitor logs in your frontend or terminal
- ngrok session expired: Restart ngrok and update your Twilio webhook URL
- Webhook not receiving calls: Verify the webhook URL format and HTTP method
- Connection issues: Check that your websocket-server is running on port 8081
- Audio problems: Ensure your OpenAI API key is valid and has sufficient credits
- Frontend: http://localhost:3000
- Backend API: http://localhost:8081
- WebSocket (chat + observability): ws://localhost:8081/chat
- Voice Client: http://localhost:3001
- Voice Message Miniapp: http://localhost:3000/miniapps/voice_message_tester/index.html
Development defaults write runtime data inside the repo, which persists locally between runs:
- Notes:
websocket-server/runtime-data/notes.json - SQLite DB:
websocket-server/runtime-data/db/assistant.sqlite
In containerized deployments, the container filesystem is ephemeral. To persist data across restarts, mount a volume and configure paths using environment variables:
RUNTIME_DATA_DIR: Base directory for runtime data (recommended). If set, defaults become:- Notes at
${RUNTIME_DATA_DIR}/notes.json - SQLite DB at
${RUNTIME_DATA_DIR}/db/assistant.sqlite
- Notes at
SESSION_HISTORY_LIMIT: Number of past conversations returned by the API/WS replay (default 3, max 50). Increase in production if you want to see more history by default.
Example Docker Compose service:
services:
websocket-server:
image: your-org/delegate1-websocket-server:latest
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- RUNTIME_DATA_DIR=/runtime-data
- SESSION_HISTORY_LIMIT=20
volumes:
- ./data/runtime-data:/runtime-data
ports:
- "8081:8081"Example Kubernetes (conceptual):
- Create a PersistentVolumeClaim and mount it at
/runtime-data. - Set
RUNTIME_DATA_DIR=/runtime-datain env. - Optionally set
SESSION_HISTORY_LIMIT=20.
From the root directory:
npm run dev- Start backend, frontend, and voice client in development modenpm run dev:core- Start only backend and frontend (without voice client)npm run start- Start both servers in production modenpm run build- Build backend, frontend, and voice clientnpm run install:all- Install dependencies for all projectsnpm run clean- Clean all node_modules and build artifactsnpm run voice-client:dev- Start only the voice client./start.sh- Quick startup script with status messages
The backend exposes a multipart REST endpoint for short audio uploads:
POST /api/voice/message
It runs STT → chat (same pipeline as text/SMS) → TTS and returns transcript, assistant text, and base64 MP3. See docs/voice-message-api.md for full request/response and error details.
These tests exercise the live WebSocket chat flow end-to-end, using real OpenAI endpoints (no mocks). They will be skipped automatically if the backend is not reachable.
Prerequisites:
-
Backend running:
ws://localhost:8081(seenpm run backend:dev) -
websocket-server/.envcontains a validOPENAI_API_KEY -
Install Playwright browsers (one-time):
npx playwright install chromium
Run tests:
npm run test:e2e
Notes:
- Tests connect to
ws://localhost:8081/chatand assert behavior from assistant responses. The first test asks for the assistant's name and requires the response to includeHK-47(the persona is defined inwebsocket-server/src/agentConfigs/personality.ts). - If you want headed UI for browser tests later, use
npm run test:e2e:headed(not required for WS-only tests).
-
Start both servers:
npm run backend:dev npm run frontend:dev
-
Install browsers once (if not already):
npx playwright install chromium
-
Run only the UI test (headless):
npm run test:e2e -- tests/e2e/ui-notes.spec.ts
-
Run headed (watch the browser):
npm run test:e2e:headed -- tests/e2e/ui-notes.spec.ts
Environment overrides (optional):
FRONTEND_PORT=3001 PORT=8082 npm run test:e2e -- tests/e2e/ui-notes.spec.tsNote: If TypeScript complains about Node globals in tests, install Node types in the root dev deps:
npm i -D @types/nodeFor the voice client to work, you need a Twilio access token:
# Generate a new access token
node generate-token.js
# Copy the generated token to voice-client/.env
# Update VITE_TWILIO_ACCESS_TOKEN with the new tokengenerate-token.js includes the region specification:
const token = new AccessToken(
config.accountSid,
config.apiKeySid,
config.apiKeySecret,
{
identity: config.identity,
region: 'au1' // Critical for AU1 region accounts
}
);Without the region specification, you'll get AccessTokenInvalid (20101) errors.
The phone/voice experience uses OpenAI Realtime with server-side voice activity detection (VAD). You can tune how sensitive it is to user speech and when it interrupts assistant speech ("barge-in").
- Location:
websocket-server/src/session/call.ts- Constants near the top of the file control sensitivity and interruption behavior:
// Voice Activity Detection (VAD) and Barge-in Configuration
const VAD_TYPE: 'server_vad' | 'semantic_vad' | 'none' = 'server_vad';
const VAD_THRESHOLD: number = 0.6; // higher = less sensitive
const VAD_PREFIX_PADDING_MS: number = 80; // speech required before start
const VAD_SILENCE_DURATION_MS: number = 300; // silence required to end
const BARGE_IN_GRACE_MS: number = 300; // ms of assistant audio before interruption allowed- How it’s applied
- In
establishRealtimeModelConnection()we send asession.updatewithturn_detectionbuilt from those constants. - Example payload excerpt (from
call.ts):
- In
jsonSend(session.modelConn, {
type: 'session.update',
session: {
modalities: ['text', 'audio'],
turn_detection: {
type: VAD_TYPE,
threshold: VAD_THRESHOLD,
prefix_padding_ms: VAD_PREFIX_PADDING_MS,
silence_duration_ms: VAD_SILENCE_DURATION_MS,
},
// ...
},
});-
Barge-in grace period
- In
processRealtimeModelEvent()we only truncate assistant speech oninput_audio_buffer.speech_startedafter at leastBARGE_IN_GRACE_MSof assistant audio has played. Increase this to reduce abrupt cutoffs; set to0for immediate barge-in.
- In
-
Runtime overrides via UI (optional)
- The web UI “Session Settings” dialog sends a
session.updatevia the chat WebSocket (/chat). The server stores this insession.saved_configand merges it into the model session on connect. If you include aturn_detectionobject there, it overrides the constants at runtime.
- The web UI “Session Settings” dialog sends a
-
Tuning tips
- Make it less sensitive to background noise: increase
VAD_THRESHOLD(e.g., 0.7–0.8) and/orVAD_PREFIX_PADDING_MS(e.g., 120–200ms). - Reduce premature turn endings: increase
VAD_SILENCE_DURATION_MS(e.g., 400–600ms). - Avoid instant barge-in: increase
BARGE_IN_GRACE_MS(e.g., 500–800ms). - If your model version ignores a field, it will be safely ignored by the API.
- Make it less sensitive to background noise: increase
Delegate 1 can discover and use tools exposed by remote MCP servers via the MCP Streamable HTTP transport. Discovered MCP tools are made available to the supervisor agent automatically and can be called like any other function tool.
- File path (created on first use):
websocket-server/runtime-data/mcp-servers.json - Shape: a JSON array of MCP server descriptors
- Supported
typevalues: only"streamable-http"is supported at the moment
The backend will create runtime-data/mcp-servers.json if it does not exist and default it to [].
Each entry must follow this shape (validated by websocket-server/src/config/mcpConfig.ts):
If the JSON is invalid or any required field is missing, the server will reject the update with a helpful error message.
- On startup and whenever the config changes, the backend runs MCP discovery:
- Code:
websocket-server/src/tools/mcp/adapter.ts→initMCPDiscovery()→performDiscovery() - It loads servers via
getMcpConfig()fromwebsocket-server/src/config/mcpConfig.ts. - For each server, it connects using
@modelcontextprotocol/sdk’s Streamable HTTP client (client.ts). - It lists tools and converts them to OpenAI function-style schemas.
- Tool names are namespaced as
mcp.{serverName}.{toolName}. - All discovered tools are injected into the supervisor agent via
updateSupervisorMcpTools().
- Code:
- After discovery, the central tools registry is rebuilt so the supervisor is allowed to call these tools.
Relevant files:
websocket-server/src/config/mcpConfig.ts(JSON read/validate/write)websocket-server/src/server/routes/mcpConfig.ts(REST API to view/update config; triggers reload)websocket-server/src/tools/mcp/client.ts(connect/list/call remote tools)websocket-server/src/tools/mcp/adapter.ts(discovery, namespacing, and registration)websocket-server/src/server/startup/init.ts(startup + reload sequence)websocket-server/src/agentConfigs/supervisorAgentConfig.ts(wires discovered tools to the supervisor)
The backend exposes a small REST API to manage the JSON config. After a successful update, discovery is forced and the registry is rebuilt automatically.
- GET
http://localhost:8081/api/mcp/config- Response:
{ text: string, servers: RemoteServerConfig[] }
- Response:
- POST
http://localhost:8081/api/mcp/config- Body:
{ "text": "<raw JSON string>" } - Validates JSON, writes to
runtime-data/mcp-servers.json, forces rediscovery, and returns{ status: 'updated', servers }.
- Body:
Example using curl:
# Read current config
curl -s http://localhost:8081/api/mcp/config | jq .
# Update config (inline JSON)
curl -s -X POST http://localhost:8081/api/mcp/config \
-H 'Content-Type: application/json' \
-d '{
"text": "[{\n \"type\": \"streamable-http\",\n \"url\": \"https://host.example/mcp\",\n \"name\": \"my-mcp\",\n \"headers\": {\n \"Authorization\": \"Bearer sk-example\"\n }\n}]"
}' | jq .Notes:
- The route is unauthenticated in development; if you expose the backend publicly, put it behind auth or a network boundary.
- Only
streamable-httpservers are supported at this time. - Headers are forwarded as provided to the MCP server on connect/calls. Avoid committing secrets to source control; prefer injecting tokens via your deployment’s secret management and updating the JSON through the POST endpoint at runtime.
- Server metadata such as name/description/version is obtained from the MCP server during initialization (
initializeresult’sserverInfo). Any description/note fields in the JSON config are ignored.
Save the following to websocket-server/runtime-data/mcp-servers.json or POST it via the REST API.
[
{
"type": "streamable-http",
"url": "https://mcp.tools.yourcompany.com/api/mcp",
"name": "corp-tools",
"headers": {
"Authorization": "Bearer ${MCP_CORP_TOOLS_TOKEN}"
}
},
{
"type": "streamable-http",
"url": "https://public.example/mcp",
"name": "public-demo",
"headers": {
"X-Env": "demo"
}
}
]After saving, watch the backend logs for lines like:
[startup] MCP discovery initialized
[mcpAdapter] MCP discovery complete. 7 tool(s) registered.
When the supervisor decides to use a tool, it will see names like mcp.corp-tools.search or mcp.public-demo.fetch. You can also trigger them via the function-calling path programmatically by referring to their namespaced schema names.
Delegate 1 uses a centralized tools registry that allows you to control which tools each agent can access. Agent policies define "allow lists" that filter the available tools from the catalog.
- Code-defined defaults: Each agent config (e.g.,
websocket-server/src/agentConfigs/baseAgent.ts) defines default tools - Runtime overrides: You can modify tool allow lists via the webapp UI at
/settings?tab=catalog - Persistent storage: Changes are saved to
websocket-server/runtime-data/agent-policies.json - Merge behavior: On startup, persisted policies override code defaults
Each agent has a policy with two filter mechanisms:
{
"allowNames": ["tool_name_1", "tool_name_2"], // Explicit tool names
"allowTags": ["supervisor-allowed", "base-default"] // Tool tags
}allowNames: Explicit list of tool names (e.g.,"create_note","mcp.real-browser.anchor_navigate")allowTags: Tags that tools are registered with (e.g.,"supervisor-allowed"for web_search)
A tool is available to an agent if it matches either an allowed name or an allowed tag.
- Navigate to Settings → Tools in the webapp
- Scroll to the agent section (e.g., "Supervisor Agent")
- Use the dropdown to add tools to the allow list
- Click "Save allow list"
- Changes persist to
runtime-data/agent-policies.json
GET /agents/:id/policy - View current policy (via /agents endpoint)
curl -s http://localhost:8081/agents | jq '.supervisor.policy'PATCH /agents/:id/policy - Update policy
curl -X PATCH http://localhost:8081/agents/supervisor/policy \
-H 'Content-Type: application/json' \
-d '{
"allowNames": ["create_note", "mcp.real-browser.anchor_navigate"],
"allowTags": ["supervisor-allowed"]
}'- File:
websocket-server/runtime-data/agent-policies.json - Format: JSON object mapping agent IDs to policies
- Docker/K8s: Respects
RUNTIME_DATA_DIRenvironment variable
Example agent-policies.json:
{
"base": {
"allowNames": ["get_weather", "escalate_to_supervisor"],
"allowTags": ["base-default"]
},
"supervisor": {
"allowNames": ["create_note", "mcp.real-browser.anchor_navigate"],
"allowTags": ["supervisor-allowed"]
}
}- MCP tools are NOT auto-allowed: Discovered MCP tools must be explicitly added to the allow list
- Builtin tools use tags: Tools like
web_searchare tagged withsupervisor-allowedand included viaallowTags - Restart behavior: Persisted policies override code defaults on server restart
- No policy file: If the file doesn't exist, agents use their code-defined defaults
These debug/inspection endpoints expose the canonical tools catalog and the agent-specific tool visibility as assembled by the centralized registry in websocket-server/src/tools/registry.ts and mounted in websocket-server/src/server/routes/catalog.ts.
-
GET
/tools- Back-compat list of raw function schemas from
websocket-server/src/functionHandlers.ts(which delegates toagentConfigs). - Example:
curl -s http://localhost:8081/tools | jq .
- Back-compat list of raw function schemas from
-
GET
/catalog/tools- Canonical tools catalog with metadata from the centralized registry (local, MCP, and built-ins).
- Fields:
id,name,sanitizedName,origin,tags,description. - Example:
curl -s http://localhost:8081/catalog/tools | jq .
-
GET
/agents- Agents debug view with exposure policies and resolved tool names.
- Example:
curl -s http://localhost:8081/agents | jq .
-
GET
/agents/:id/tools- Tools available to a specific agent in OpenAI Responses API "tools" format.
- For built-ins (e.g., web search), entries look like
{ "type": "web_search" }. - For functions, entries look like
{ "type": "function", "name": "<sanitizedName>", "description": "...", "parameters": { ... }, "strict": false }. - Example (replace
supervisorwith your agent id):curl -s http://localhost:8081/agents/supervisor/tools | jq .
Notes:
- These endpoints are intended for development/observability. If you expose the backend publicly, secure them appropriately.
- MCP tools are discovered at startup and after successful updates via
POST /api/mcp/config(see section above). The catalog reflects the current registry state without needing a server restart.
[Development guidelines and contribution information will be added here].
[License information will be added here]
[ { "type": "streamable-http", // required; only this type is supported currently "url": "https://host.example/mcp", // required; full URL of MCP Streamable HTTP endpoint "name": "my-mcp", // required; unique server name used for namespacing "headers": { // optional; custom headers sent to the MCP server "Authorization": "Bearer <token>", "X-Custom": "value" } } ]