Skip to content

Add server-side publishing guide for AI Transport#3227

Open
zknill wants to merge 1 commit intomainfrom
zak/ait-443/server-side-token-streaming-publish
Open

Add server-side publishing guide for AI Transport#3227
zknill wants to merge 1 commit intomainfrom
zak/ait-443/server-side-token-streaming-publish

Conversation

@zknill
Copy link
Contributor

@zknill zknill commented Feb 25, 2026

New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.


I've additionally included a nice single-page web-app here so that you can see / test the AblyConnectionPool code:

test-connection-pool.html

Just open that html page in your browser and include a prod API key

New page in AI Transport > Token streaming covering
Realtime connections, message ordering guarantees,
transient publishing and channel limits, per-connection
rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.
@zknill zknill requested a review from mschristensen February 25, 2026 17:34
@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zak/ait-443/server-side-token-streaming-publish

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@mschristensen mschristensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nice info in here.

I'm not sure about the connection pool implementation specifics here, but I think the abstraction could be useful, I wonder if it's worth just implementing in e.g. ably-js and reviewing the implementation through PR there?


## Transient publishing and channel limits <a id="transient"/>

In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server publishes to a channel without attaching first, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on the [number of channels per connection](/docs/platform/pricing/limits#connection).
<Aside data-type="note">
The server must attach to the channel in order to subscribe to it. In this case, the SDK client instance will not use transient publishing.
</Aside>


All message actions use the same transient publish path, including `publish()` and `appendMessage()`. This means a single connection can publish to thousands of distinct channels without hitting the channel limit. No additional configuration is required. When you call `publish()` or `appendMessage()` on a channel that the client has not explicitly attached to, the SDK handles the transient attachment automatically.

The constraint to be aware of is the [per-connection inbound message rate](/docs/platform/pricing/limits#connection), not the number of channels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the case?

If you also need to subscribe to channels on the same connection, those subscriptions require explicit attachment and will count toward the channel limit.
</Aside>

## Per-connection rate limits <a id="rate-limits"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the content in this section belongs in the existing /docs/ai-transport/token-streaming/token-rate-limits, were you aware of that page?


<Code>
```javascript
class AblyConnectionPool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wonderin g whether we should call this AblyClientPool. I know there is one connection per client, but there is the concept of a connection inside the client (e.g. the connection state listener client.connection.on etc) so it feels a bit weird to call it the same thing a layer above the client

(We also call it getClient below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think an abstraction like this is useful, I wonder if it's worth adding to the SDK

newClient.connection.on((stateChange) => {
console.warn(`[Pool conn ${index}] ${stateChange.previous} → ${stateChange.current}`);
if (stateChange.current === 'failed') {
this._replaceConnection(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a connection fails, there is probably a network issue, and creating a new instance seems unlikely to recover the situation. (Also, in this case, if the new client's connection enter the failed state as a result, this might overflow the call stack?)


When your server handles more concurrent AI response streams than a single connection supports, create additional Realtime clients. Each client uses its own connection with its own message rate budget, so throughput scales linearly with the number of connections.

Route channels to connections using consistent hashing so that all operations for a given channel always go through the same connection. This preserves [message ordering](#ordering) for each response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks like standard modulo hashing, not consistent hashing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-app Create a Heroku review app

Development

Successfully merging this pull request may close these issues.

3 participants