Add server-side publishing guide for AI Transport#3227

Open

zknill wants to merge 1 commit intomainfrom

zak/ait-443/server-side-token-streaming-publish

Contributor

zknill commented Feb 25, 2026

New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.

I've additionally included a nice single-page web-app here so that you can see / test the AblyConnectionPool code:

test-connection-pool.html

Just open that html page in your browser and include a prod API key


          Add server-side publishing guide for AI Transport

542cf21

New page in AI Transport > Token streaming covering
Realtime connections, message ordering guarantees,
transient publishing and channel limits, per-connection
rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.

zknill requested a review from mschristensen

February 25, 2026 17:34

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zak/ait-443/server-side-token-streaming-publish

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mschristensen added the review-app label

ably-ci deployed to ably-docs-zak-ait-443-s-whtjs0

February 27, 2026 17:58

View deployment

mschristensen requested changes

View reviewed changes

Contributor

mschristensen left a comment

Some nice info in here.

I'm not sure about the connection pool implementation specifics here, but I think the abstraction could be useful, I wonder if it's worth just implementing in e.g. ably-js and reviewing the implementation through PR there?

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx


		## Transient publishing and channel limits <a id="transient"/>

		In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).

Contributor

mschristensen Feb 27, 2026

Suggested change

      
            In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
          
            In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server publishes to a channel without attaching first, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on the [number of channels per connection](/docs/platform/pricing/limits#connection).
          
            <Aside data-type="note">
          
            The server must attach to the channel in order to subscribe to it. In this case, the SDK client instance will not use transient publishing.
          
            </Aside>

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx


		All message actions use the same transient publish path, including `publish()` and `appendMessage()`. This means a single connection can publish to thousands of distinct channels without hitting the channel limit. No additional configuration is required. When you call `publish()` or `appendMessage()` on a channel that the client has not explicitly attached to, the SDK handles the transient attachment automatically.

		The constraint to be aware of is the [per-connection inbound message rate](/docs/platform/pricing/limits#connection), not the number of channels.

Contributor

mschristensen Feb 27, 2026

Why is this the case?

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx

+              If you also need to subscribe to channels on the same connection, those subscriptions require explicit attachment and will count toward the channel limit.
+              </Aside>
+              ## Per-connection rate limits <a id="rate-limits"/>

Contributor

mschristensen Feb 27, 2026

I feel that the content in this section belongs in the existing /docs/ai-transport/token-streaming/token-rate-limits, were you aware of that page?

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx

+              <Code>
+              ```javascript
+              class AblyConnectionPool {

Contributor

mschristensen Feb 27, 2026

I'm wonderin g whether we should call this AblyClientPool. I know there is one connection per client, but there is the concept of a connection inside the client (e.g. the connection state listener client.connection.on etc) so it feels a bit weird to call it the same thing a layer above the client

(We also call it getClient below)

Contributor

mschristensen Feb 27, 2026

If we think an abstraction like this is useful, I wonder if it's worth adding to the SDK

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx

+                  newClient.connection.on((stateChange) => {
+                    console.warn(`[Pool conn ${index}] ${stateChange.previous} → ${stateChange.current}`);
+                    if (stateChange.current === 'failed') {
+                      this._replaceConnection(index);

Contributor

mschristensen Feb 27, 2026

If a connection fails, there is probably a network issue, and creating a new instance seems unlikely to recover the situation. (Also, in this case, if the new client's connection enter the failed state as a result, this might overflow the call stack?)

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx


		When your server handles more concurrent AI response streams than a single connection supports, create additional Realtime clients. Each client uses its own connection with its own message rate budget, so throughput scales linearly with the number of connections.

		Route channels to connections using consistent hashing so that all operations for a given channel always go through the same connection. This preserves [message ordering](#ordering) for each response.

Contributor

mschristensen Feb 27, 2026

The implementation looks like standard modulo hashing, not consistent hashing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels