-
Notifications
You must be signed in to change notification settings - Fork 664
Aggregate and broadcast DbUpdates off the main thread
#2793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
and after releasing the datastore lock. This commit moves some amount of work off the main `eval_updates_sequential` task and into a new background worker, the `send_worker`. The actual work performed is unchanged, it's just that some of it is moved to a different function which runs asynchronously after releasing the datastore lock. We expect two benefits from this: 1. Most obviously, we're now doing less work while holding the lock, reducing the size of our critical section. 2. We suspect that we had a thundering herd problem where we wake each client's WebSocket worker just before waking the next reducer task, making it wait in the Tokio task queue until all the client messages have been sent. This change should cause the reducer task to wake first, improving transaction throughput. Moving the aggregation and broadcast to a separate task required enclosing the `SubscriptionManager`'s client map in an `Arc<RwLock>`. There's some amount of tricky code in there, with comments, that has to go out of its way to avoid deadlocks. I also chose to change the `table_name` to be an `Arc<str>` rather than a `Box<str>`, as it needed to be cloned from `eval_updates_sequential` into the `send_worker`, where previously it was just borrowed within `eval_updates_sequential`. We could further reduce cloning by threading the change from `Box` to `Arc` further through our code, but I chose to keep the change relatively contained within the subscription-related code. Additionally, while reworking the `SubscriptionManger`'s code, I changed from using tuples to structs in a few places. This leaked out a bit, and I changed some methods which used to take `(args: (T, U))` to instead take `arg_t: T, arg_u: U`. The only callers outside of `module_subscription_manager.rs` (the ones I was changing anyways) were constructing the tuple in the call anyways, and so now look cleaner. Many tests are changed to now require a Tokio runtime, so that the `SubscriptionManager` can `tokio::spawn` its `Self::send_worker` future. Some tests which previously had a single-threaded runtime now require a multi-threaded one. This is mildly painful.
|
@joshua-spacetime suggested that, rather than enclosing the |
Could you please newtype this to |
This commit adds a new metric, `subscription_send_queue_length`. It is incremented by `eval_updates_sequential` before pushing into the new queue, and decremented by the `send_worker` after popping. Because the metric is labeled with `database_identity: Identity`, the `SubscriptionManager` now needs to know its `Identity`, and so cannot be `Default`. Methods are added to make constructing one in tests easier.
Incl.: - Renaming `ClientQueryUpdate` to `SingleQueryUpdate` (not requested, but a better name), and moving it into `websocket.rs` (requested). - Restoring uncurrying of `TableUpdate::new` and `TableUpdate::push`, which shifts the partial abstraction around in a way I find irrelevant but Mazdak seems to appreciate. Specifically, the aggregator loop now doesn't have to destructure its `SingleQueryUpdate`s, but `ExecutionUnit::eval` and `collect_table_update` have to construct them needlessly. - Consolidating test constructors for `ModuleSubscriptions`. - Various comments. - Separating the body of the `send_worker`'s loop into a separate function.
…clockworklabs/SpacetimeDB into phoebe/subscription-manager-send-worker
Centril
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
Thank god for `clippy::disallowed_macros`!
…-manager-send-worker


Description of Changes
and after releasing the datastore lock.
This commit moves some amount of work off the main
eval_updates_sequentialtask and into a new background worker, thesend_worker. The actual work performed is unchanged, it's just that some of it is moved to a different function which runs asynchronously after releasing the datastore lock. We expect two benefits from this:Moving the aggregation and broadcast to a separate task required enclosing the
SubscriptionManager's client map in anArc<RwLock>. There's some amount of tricky code in there, with comments, that has to go out of its way to avoid deadlocks.I also chose to change the
table_nameto be anArc<str>rather than aBox<str>, as it needed to be cloned fromeval_updates_sequentialinto thesend_worker, where previously it was just borrowed withineval_updates_sequential. We could further reduce cloning by threading the change fromBoxtoArcfurther through our code, but I chose to keep the change relatively contained within the subscription-related code.Additionally, while reworking the
SubscriptionManger's code, I changed from using tuples to structs in a few places. This leaked out a bit, and I changed some methods which used to take(args: (T, U))to instead takearg_t: T, arg_u: U.The only callers outside of
module_subscription_manager.rs(the ones I was changing anyways) were constructing the tuple in the call anyways, and so now look cleaner.Many tests are changed to now require a Tokio runtime, so that the
SubscriptionManagercantokio::spawnitsSelf::send_workerfuture. Some tests which previously had a single-threaded runtime now require a multi-threaded one. This is mildly painful.A new metric,
subscription_send_queue_length, is also added. It is incremented byeval_updates_sequentialbefore pushing into the new queue, and decremented by thesend_workerafter popping.Because the metric is labeled with
database_identity: Identity, theSubscriptionManagernow needs to know itsIdentity, and so cannot beDefault. Methods are added to make constructing one in tests easier.API and ABI breaking changes
N/a
Expected complexity level and risk
3, some risk of deadlock due to added synchronization with the client map.
Testing