Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
fbfcb6b
Adding agent websocket for web and mobile
chongzluong Sep 25, 2025
0d54454
Fixing fern build
chongzluong Sep 25, 2025
f07da06
Whoops it's wss not ws
chongzluong Sep 25, 2025
876c2d5
Event names, params, and layout changes
chongzluong Sep 25, 2025
c0f952b
Updating to web call page, modifying description & diagram
chongzluong Sep 25, 2025
c1da309
Updating media event to distinguish between input and output
chongzluong Sep 25, 2025
fbc6914
Updating with keepalive and timeout information
chongzluong Sep 26, 2025
7f11c5a
Removing output format from the config
chongzluong Sep 26, 2025
2816859
Updating docs with specific metadata fields
chongzluong Sep 26, 2025
9f02d21
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
9c55dc0
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
9fe8493
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
8313ea0
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
ddbf0e6
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
f812447
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
d224a4a
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
7ec9d48
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
cba41cd
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
6e012d2
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
958ca4a
Update fern/agents/integrations/web-calls.mdx
chongzluong Sep 26, 2025
2eba66c
Updating with nits
chongzluong Sep 26, 2025
5a617f5
idk when the numbering changed
chongzluong Sep 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions fern/agents/integrations/web-calls.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# Web Calls

The Agents WebSocket provides real-time, bidirectional communication between web clients and Cartesia voice agents. It enables streaming audio input and real-time agent responses for browser-based or custom applications.

```mermaid
graph LR
A1[Web Client] <-->|General Websocket Events| B[Cartesia API]
B <-->|Agent Events| C[Cartesia Agents]
```

## Connection

Connect to the WebSocket endpoint:

```
wss://api.cartesia.ai/agents/stream/{agent_id}
```

**Headers:**

| Header | Value |
|--------|-------|
| `Authorization` | `Bearer {your_api_key}` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token, not API key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i thought it could be either - i'll update tho.

| `Cartesia-Version` | `2025-04-16` |

## Protocol Overview

The WebSocket connection uses JSON messages for control events and base64-encoded audio for media. The lifecycle follows this sequence:

1. **Client → Server:** Send a start event to initialize the stream.
2. **Server → Client:** Receive an ack event confirming configuration.
3. **Bidirectional exchange:** The client and server exchange streaming audio and control events until either side closes the connection, or the inactivity timeout is fired.
4. **Close:** Either side ends the session with a standard WebSocket close frame.

If the client doesn’t provide a `stream_id` in the initial `start` event, the server generates one and returns it in the `ack` response.

## Client events

### Start Event

Initializes the audio stream configuration.
- The `config` parameter will optionally alter the input audio settings, overriding what your default agent configuration might otherwise be
- The `stream_id` can be set manually if you wish to maintain this on the client end for observability purposes. If not specified, we'll generate one and return it in the `ack` event

**This must be the first message sent.**

```json
{
"event": "start",
"stream_id": "unique_id",
"config": {
"input_format": "pcm_44100"
},
"metadata": {
"to": "user@example.com",
"from": "+1234567890"
}
}
```

**Fields:**
- `stream_id` (optional): Stream identifier. If not provided, server generates one
- `config.input_format`: Audio format for client audio input (`mulaw_8000`, `pcm_16000`, `pcm_24000`, `pcm_44100`)
- `metadata` (optional): Custom metadata object. These will be passed through to the user code, but there are some special fields you can use as well:
- `to` (optional): Destination identifier for call routing (defaults to agent ID)
- `from` (optional): Source identifier for the call (defaults to "websocket")

### Media Input Event

Audio data sent from the client to the server. `payload` audio data should be base64 encoded.

```json
{
"event": "media_input",
"stream_id": "unique_id",
"media": {
"payload": "base64_encoded_audio_data"
}
}
```

**Fields:**
- `stream_id`: Unique identifier for the Stream from the ack response
- `media.payload`: Base64-encoded audio data in the format specified in the start event

### DTMF Event

Sends DTMF (dual-tone multi-frequency) tones.

```json
{
"event": "dtmf",
"stream_id": "example_id",
"dtmf": "1"
}
```

**Fields:**
- `stream_id`: Stream identifier
- `dtmf`: DTMF digit (0-9, *, #)

### Custom Event

Sends custom metadata to the agent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should explain how this shows up to agent code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh it does not yet show up, we have a ticket for this outstanding


```json
{
"event": "custom",
"stream_id": "example_id",
"metadata": {
"user_id": "user123",
"session_info": "custom_data"
}
}
```

**Fields:**
- `stream_id`: Stream identifier
- `metadata`: Object containing key-value pairs of custom data

## Server events

### Ack Event

Server acknowledgment of the start event, confirming stream configuration.
If `stream_id` wasn't provided in the initial `start` event, this is where the user can obtain the server generated `stream_id`.

```json
{
"event": "ack",
"stream_id": "example_id",
"config": {
"input_format": "pcm_44100"
}
}
```

### Media Output Event

Server sends agent audio response. `payload` is base 64 encoded audio data.

```json
{
"event": "media_output",
"stream_id": "example_id",
"media": {
"payload": "base64_encoded_audio_data"
}
}
```

### Clear Event

Indicates the agent wants to clear/interrupt the current audio stream.

```json
{
"event": "clear",
"stream_id": "example_id"
}
```

### DTMF Event

Server sends DTMF tones from the agent.

```json
{
"event": "dtmf",
"stream_id": "example_id",
"dtmf": "5"
}
```

### Custom Event

Server sends custom metadata from the agent.

```json
{
"event": "custom",
"stream_id": "example_id",
"metadata": {
"agent_state": "processing",
"confidence": 0.95,
"custom_data": "value"
}
}
```

## Connection Management

### Inactivity Timeout

The server automatically closes idle WebSocket connections after **30 seconds** of inactivity. Activity is defined as receiving any message from the client, including:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The server automatically closes idle WebSocket connections after **30 seconds** of inactivity. Activity is defined as receiving any message from the client, including:
The server closes idle WebSocket connections after **30 seconds** without client activity. Any client message counts as activity, including:


- Application messages (media_input, dtmf, custom events)
- Standard WebSocket ping frames
- Any other valid WebSocket message

When the timeout occurs, the connection is closed with:
- **Code:** 1000 (Normal Closure)
- **Reason:** `"connection idle timeout"`

### Ping/Pong Keepalive

To prevent inactivity timeouts during periods of silence, use standard WebSocket ping frames for periodic keepalive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To prevent inactivity timeouts during periods of silence, use standard WebSocket ping frames for periodic keepalive:
Send periodic WebSocket ping frames to keep the connection alive:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

periods of silence is misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sauhardjain i think the issue is we don't want folks to keep this alive indefinitely without reason. Although we'll get paid (lol) they'll probably get pissed


```python
# Client sends ping to reset inactivity timer
pong_waiter = await websocket.ping()
latency = await pong_waiter
```

```javascript
// JavaScript example
setInterval(() => {
if (websocket.readyState === WebSocket.OPEN) {
websocket.ping();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

websocket.ping() isn't supported on browsers, so we should make this example specific to Node.js

}
}, 20000); // Send ping every 20 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}, 20000); // Send ping every 20 seconds
}, 20000); // every 20s to avoid 30s idle timeout

```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add browser-specific example:

// Browser example: send a custom keepalive event
setInterval(() => {
  if (websocket.readyState === WebSocket.OPEN) {
    websocket.send(JSON.stringify({
      event: "custom",
      stream_id: "unique_id",
      metadata: { type: "heartbeat" }
    }));
  }
}, 20000); // every 20s to avoid 30s idle timeout

The server automatically responds to ping frames with pong frames and resets the inactivity timer upon receiving any message.

### Connection Close

The connection can be closed by either the client or server using WebSocket close frames.

**Client-initiated close:**
```python
await websocket.close(code=1000, reason="session completed")
```

**Server-initiated close:**
When the agent ends the call, the server closes the connection with:
- **Code:** 1000 (Normal Closure)
- **Reason:** `"call ended by agent"` or `"call ended by agent, reason: {specific_reason}"` if additional context is available

## Best Practices

1. **Always send start event first** - The connection will be closed if any other event is sent before start
2. **Use appropriate audio formats** - Match your input format to your audio source capabilities. For telephony providers this is often `mulaw_8000`, while for web clients this will often be `pcm_44000`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should recommend 16k

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahlt could you elaborate on the 16k recommendation? Why is it better and for which use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahlt the input gets resampled to 16k by default in our Pipecat pipeline since that's what our STT takes, independent of what the input Transport takes.

But right now input Transport also informs our output (we have a ticket to fit this)

3. **Handle connection close gracefully** - Monitor close events and reasons for debugging
4. **Implement keepalive for calls with longer periods of silence** - Send WebSocket ping frames every 20-25 seconds to prevent the 30-second inactivity timeout during periods of silence
5. Send your own stream_id's for the best observability
6. Always handle timeout closures (`1000 / connection idle timeout`) by reconnecting and resending a `start` event.
Comment on lines +242 to +247
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Always send start event first** - The connection will be closed if any other event is sent before start
2. **Use appropriate audio formats** - Match your input format to your audio source capabilities. For telephony providers this is often `mulaw_8000`, while for web clients this will often be `pcm_44000`
3. **Handle connection close gracefully** - Monitor close events and reasons for debugging
4. **Implement keepalive for calls with longer periods of silence** - Send WebSocket ping frames every 20-25 seconds to prevent the 30-second inactivity timeout during periods of silence
5. Send your own stream_id's for the best observability
6. Always handle timeout closures (`1000 / connection idle timeout`) by reconnecting and resending a `start` event.
1. **Send `start` first** The connection closes if any other event is sent before `start`.
1. **Choose the right audio format** Match the format to your source: `mulaw_8000` for telephony, `pcm_44100` for web clients.
1. **Handle closes cleanly** — Always capture close codes and reasons for debugging and recovery.
1. **Keep the connection alive** Send WebSocket ping frames every 2025 seconds to avoid the 30-second inactivity timeout.
1. **Manage stream IDs** — Provide your own `stream_id` values to improve observability across systems.
1. **Recover from idle timeouts** — On `1000 / connection idle timeout`, reconnect and resend a `start` event.

3 changes: 3 additions & 0 deletions fern/versions/2025-04-16.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,9 @@ navigation:
path: ../agents/integrations/telephony/agents-telephony-overview.mdx
- page: Outbound Dialing
path: ../agents/integrations/telephony/agents-telephony-outbound.mdx
- page: Web Calls
path: ../agents/integrations/web-calls.mdx
icon: fa-solid fa-browser
- section: Infrastructure
contents:
- page: Deployments
Expand Down
Loading