Skip to content

Conversation

@chongzluong
Copy link
Contributor

@chongzluong chongzluong commented Sep 25, 2025

Overview

Some Cartesia users are interested in integrating their agents with their websites, rather than with telephony. For these folks, we're outlining a more generalized websocket so they can handle the events outputted by an agent on their own and pass in their own audio events.

Additions

Adding a new page under integrations (open to debate around placement here) for web calls.

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@chongzluong chongzluong changed the title [Agents] Adding agent websocket for web and mobile clients [Agents] Adding agent websocket for web calls Sep 25, 2025
@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

chongzluong and others added 4 commits September 25, 2025 23:31
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

@github-actions
Copy link

Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

@github-actions
Copy link

chongzluong and others added 3 commits September 25, 2025 23:32
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

Co-authored-by: Sauhard Jain <sauhardjain03@gmail.com>
@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

@github-actions
Copy link

Copy link
Collaborator

@noahlt noahlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add an API reference for this?


| Header | Value |
|--------|-------|
| `Authorization` | `Bearer {your_api_key}` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token, not API key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i thought it could be either - i'll update tho.


### Custom Event

Sends custom metadata to the agent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should explain how this shows up to agent code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh it does not yet show up, we have a ticket for this outstanding

## Best Practices

1. **Always send start event first** - The connection will be closed if any other event is sent before start
2. **Use appropriate audio formats** - Match your input format to your audio source capabilities. For telephony providers this is often `mulaw_8000`, while for web clients this will often be `pcm_44000`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should recommend 16k

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahlt could you elaborate on the 16k recommendation? Why is it better and for which use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahlt the input gets resampled to 16k by default in our Pipecat pipeline since that's what our STT takes, independent of what the input Transport takes.

But right now input Transport also informs our output (we have a ticket to fit this)

Comment on lines +242 to +247
1. **Always send start event first** - The connection will be closed if any other event is sent before start
2. **Use appropriate audio formats** - Match your input format to your audio source capabilities. For telephony providers this is often `mulaw_8000`, while for web clients this will often be `pcm_44000`
3. **Handle connection close gracefully** - Monitor close events and reasons for debugging
4. **Implement keepalive for calls with longer periods of silence** - Send WebSocket ping frames every 20-25 seconds to prevent the 30-second inactivity timeout during periods of silence
5. Send your own stream_id's for the best observability
6. Always handle timeout closures (`1000 / connection idle timeout`) by reconnecting and resending a `start` event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Always send start event first** - The connection will be closed if any other event is sent before start
2. **Use appropriate audio formats** - Match your input format to your audio source capabilities. For telephony providers this is often `mulaw_8000`, while for web clients this will often be `pcm_44000`
3. **Handle connection close gracefully** - Monitor close events and reasons for debugging
4. **Implement keepalive for calls with longer periods of silence** - Send WebSocket ping frames every 20-25 seconds to prevent the 30-second inactivity timeout during periods of silence
5. Send your own stream_id's for the best observability
6. Always handle timeout closures (`1000 / connection idle timeout`) by reconnecting and resending a `start` event.
1. **Send `start` first** The connection closes if any other event is sent before `start`.
1. **Choose the right audio format** Match the format to your source: `mulaw_8000` for telephony, `pcm_44100` for web clients.
1. **Handle closes cleanly** — Always capture close codes and reasons for debugging and recovery.
1. **Keep the connection alive** Send WebSocket ping frames every 2025 seconds to avoid the 30-second inactivity timeout.
1. **Manage stream IDs** — Provide your own `stream_id` values to improve observability across systems.
1. **Recover from idle timeouts** — On `1000 / connection idle timeout`, reconnect and resend a `start` event.

// JavaScript example
setInterval(() => {
if (websocket.readyState === WebSocket.OPEN) {
websocket.ping();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

websocket.ping() isn't supported on browsers, so we should make this example specific to Node.js

}
}, 20000); // Send ping every 20 seconds
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add browser-specific example:

// Browser example: send a custom keepalive event
setInterval(() => {
  if (websocket.readyState === WebSocket.OPEN) {
    websocket.send(JSON.stringify({
      event: "custom",
      stream_id: "unique_id",
      metadata: { type: "heartbeat" }
    }));
  }
}, 20000); // every 20s to avoid 30s idle timeout

if (websocket.readyState === WebSocket.OPEN) {
websocket.ping();
}
}, 20000); // Send ping every 20 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}, 20000); // Send ping every 20 seconds
}, 20000); // every 20s to avoid 30s idle timeout


### Inactivity Timeout

The server automatically closes idle WebSocket connections after **30 seconds** of inactivity. Activity is defined as receiving any message from the client, including:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The server automatically closes idle WebSocket connections after **30 seconds** of inactivity. Activity is defined as receiving any message from the client, including:
The server closes idle WebSocket connections after **30 seconds** without client activity. Any client message counts as activity, including:


### Ping/Pong Keepalive

To prevent inactivity timeouts during periods of silence, use standard WebSocket ping frames for periodic keepalive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To prevent inactivity timeouts during periods of silence, use standard WebSocket ping frames for periodic keepalive:
Send periodic WebSocket ping frames to keep the connection alive:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

periods of silence is misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sauhardjain i think the issue is we don't want folks to keep this alive indefinitely without reason. Although we'll get paid (lol) they'll probably get pissed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants