-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3944: Dropping stale send-to-device messages #3944
base: main
Are you sure you want to change the base?
Conversation
This looks great, and would help us out over at Beeper I think. We've been seeing clients (especially Element Web) generate a fair amount of room key chatter and mobile apps that aren't syncing actively at the time end up getting a large backlog of requests and cancellations when they do sync again. Since Synapse also limits the number of to_device events to 100 per response, this results in us having to do several syncs back-to-back until we can consider the app "caught up". |
- `m.room_key_request` with `action: request`, if there is a corresponding | ||
`m.room_key_request` message with `action: request_cancellation` and the same | ||
`request_id` and `requesting_device_id` fields, sent by the same user. If | ||
the request message is dropped, the cancellation message is dropped as well. | ||
- `m.secret.request` with `action: request`, if there is a corresponding | ||
`m.secret.request` with `action: request_cancellation` and the same | ||
`request_id` and `requesting_device_id` fields, sent by the same user. If | ||
the request message is dropped, the cancellation message is dropped as well. | ||
- `m.key.verification.request`, if there is a corresponding | ||
`m.key.verification.cancel` with the same `transaction_id` field, sent by the | ||
same user. If the request message is dropped, the cancellation message is | ||
dropped as well. | ||
- `m.key.verification.request` when the `timestamp` is more than 10 minutes in | ||
the past. Since clients are to ignore verification requests that are older | ||
than this, they can be safely dropped. The server may wish to remember the | ||
`transaction_id` and sender for some time, so that it can also drop the | ||
`m.key.verification.cancel` message with the same `transaction_id` and | ||
sender, if such a message gets sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit frustrating that we need to set out special rules for each type of to-device message, rather than laying out a generic framework for cancellable and expiring to-device messages. It feels like something of a layering violation, and feels like we're always going to be extending this list (with consequent extra work for server implementations).
the `requesting_device_id` (in the case of `m.room_key_request` and | ||
`m.secret.request`), or `device_id` (in the case of | ||
`m.key.verification.request`) refers to a device that no longer exists. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extend this to include semantically equivalent to-device messages. There's real world evidence that some clients will send request cancellations with the exact same request ID / requesting device ID many (>40) times.
@@ -0,0 +1,70 @@ | |||
# MSC3944: Dropping stale send-to-device messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to implement this in the sliding sync proxy because it is aligned with the core bandwidth / speed goals of MSC3575. Before implementing this though, I did some analysis on the size/kinds of to-device events that stack up on the proxy for clients:
- The largest stack had a huge 237852 to-device events pending. This is uncommon, and rapidly drops off. The median is somewhere in the 100s.
- Breaking down by event type, the vast majority by several orders of magnitudes are
m.room_key_request
events:
event_type | count
----------------------------+--------
m.room_key_request | 121067
m.room.encrypted | 256
m.secret.request | 51
m.key.verification.cancel | 16
m.key.verification.request | 13
- By following the semantics in this MSC, I found that:
28089 to-device messages
14495 unique device|req_id tuples
1564 outstanding requests
12931 cancelled requests
260 multi-cancelled requests
potential savings: removing 26525 messages or 94.431984050696 %
- "multi-cancelled" requests are identical request cancellation events for the same request ID / requesting device ID.
- The 94% figure is consistent on large (100,000s events) accounts as well as mid-range (10,000s events) accounts. The smaller the stack, the less the benefit. On a device with 204 pending events, this MSC would remove 99 of them (48%).
There is real and immediate benefit to be had by implementing this MSC everywhere, particularly for m.room_key_request
events which is the event type I did this analysis on.
Rendered