Add server-side publishing guide for AI Transport#3227
Conversation
New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and message-per-token patterns, and a connection pool example for handling multiple concurrent streams.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
mschristensen
left a comment
There was a problem hiding this comment.
Some nice info in here.
I'm not sure about the connection pool implementation specifics here, but I think the abstraction could be useful, I wonder if it's worth just implementing in e.g. ably-js and reviewing the implementation through PR there?
|
|
||
| ## Transient publishing and channel limits <a id="transient"/> | ||
|
|
||
| In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection). |
There was a problem hiding this comment.
| In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection). | |
| In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server publishes to a channel without attaching first, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on the [number of channels per connection](/docs/platform/pricing/limits#connection). | |
| <Aside data-type="note"> | |
| The server must attach to the channel in order to subscribe to it. In this case, the SDK client instance will not use transient publishing. | |
| </Aside> |
|
|
||
| All message actions use the same transient publish path, including `publish()` and `appendMessage()`. This means a single connection can publish to thousands of distinct channels without hitting the channel limit. No additional configuration is required. When you call `publish()` or `appendMessage()` on a channel that the client has not explicitly attached to, the SDK handles the transient attachment automatically. | ||
|
|
||
| The constraint to be aware of is the [per-connection inbound message rate](/docs/platform/pricing/limits#connection), not the number of channels. |
There was a problem hiding this comment.
Why is this the case?
| If you also need to subscribe to channels on the same connection, those subscriptions require explicit attachment and will count toward the channel limit. | ||
| </Aside> | ||
|
|
||
| ## Per-connection rate limits <a id="rate-limits"/> |
There was a problem hiding this comment.
I feel that the content in this section belongs in the existing /docs/ai-transport/token-streaming/token-rate-limits, were you aware of that page?
|
|
||
| <Code> | ||
| ```javascript | ||
| class AblyConnectionPool { |
There was a problem hiding this comment.
I'm wonderin g whether we should call this AblyClientPool. I know there is one connection per client, but there is the concept of a connection inside the client (e.g. the connection state listener client.connection.on etc) so it feels a bit weird to call it the same thing a layer above the client
(We also call it getClient below)
There was a problem hiding this comment.
If we think an abstraction like this is useful, I wonder if it's worth adding to the SDK
| newClient.connection.on((stateChange) => { | ||
| console.warn(`[Pool conn ${index}] ${stateChange.previous} → ${stateChange.current}`); | ||
| if (stateChange.current === 'failed') { | ||
| this._replaceConnection(index); |
There was a problem hiding this comment.
If a connection fails, there is probably a network issue, and creating a new instance seems unlikely to recover the situation. (Also, in this case, if the new client's connection enter the failed state as a result, this might overflow the call stack?)
|
|
||
| When your server handles more concurrent AI response streams than a single connection supports, create additional Realtime clients. Each client uses its own connection with its own message rate budget, so throughput scales linearly with the number of connections. | ||
|
|
||
| Route channels to connections using consistent hashing so that all operations for a given channel always go through the same connection. This preserves [message ordering](#ordering) for each response. |
There was a problem hiding this comment.
The implementation looks like standard modulo hashing, not consistent hashing
New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.
I've additionally included a nice single-page web-app here so that you can see / test the AblyConnectionPool code:
test-connection-pool.html
Just open that html page in your browser and include a prod API key