Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Implement MSC3944 to delete stale to-device messages #16082

Open
clokep opened this issue Aug 7, 2023 · 6 comments
Open

Implement MSC3944 to delete stale to-device messages #16082

clokep opened this issue Aug 7, 2023 · 6 comments
Assignees
Labels
O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@clokep
Copy link
Member

clokep commented Aug 7, 2023

MSC3944 defines some conditions where it is safe to delete "stale" to-device messages. See the MSC for specifics on when this is true.

See #15842 for a potential implementation.

This will help with #3599, but it is unclear how much.

@clokep clokep added S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience labels Aug 7, 2023
@clokep clokep self-assigned this Aug 7, 2023
@clokep
Copy link
Member Author

clokep commented Aug 7, 2023

The proposed rules for when to delete stale messages generally fall into 3 categories:

  • Delete a to-device request when a subsequent cancellation request with the same parameters is received (but neither has been delivered).
  • Coalesce duplicate requests that have not been delivered.
  • Old m.key.verification.request as clients are required to discard them anyway.

Unfortunately the fields we're searching on aren't exposed in the database as separate columns.

We'll want to handle this for both newly received to-device messages as well as for any old messages that might currently exist in the database.

@clokep
Copy link
Member Author

clokep commented Aug 7, 2023

#15842 attempts to do this when a user uploads new to-device messages, I think this has a couple of issues:

  1. It doesn't handle to-device messages over federation (although it could be refactored to);
  2. It blocks the user's to-device messages from being added to the database until the stale logic completes

Another approach might be to run a background loop that handles the to-device stream to handle stale messages. (This is siilar to how we do the event_push_actions clean-up / summarization.) This might not capture every "stale" to-device messages if the background loop doesn't run until after the to-device message is delivered, but if it keeps relatively up with "new" messages then it should work fine (and if a device is online and receiving to-device messages then it is likely that the stale messages would already have been delivered and there's nothing to do anyway).

@H-Shay
Copy link
Contributor

H-Shay commented Aug 7, 2023

related: #3599

@H-Shay
Copy link
Contributor

H-Shay commented Aug 7, 2023

+1 for the background loop - that seems reasonable.

@erikjohnston
Copy link
Member

There's a race we need to be careful to consider here:

  1. The server receives a key request for a device.
  2. The device syncs and receives the key request, which does not delete the row from the DB yet.
  3. The server receives the cancellation for the device, sees the key request in the table, deletes both the request and cancellation.
  4. The device syncs again and doesn't receive the cancellation (this would be the point where the server would delete the key request row).

Do we care about this race? It feels like its no worse than the existing race where a client receives the cancellation after its already processed it?

We could keep track of which rows we've already sent down to clients, but that feels like extra bookkeeping i'd rather avoid.


Separately, I think we need to flesh out what the background job actually looks like in practice, as we'd need to track data (such as in flight requests) and where do we store it?

It might be easier if we extend the table to pull out some of the information in the JSON so that we can efficiently look up key requests by request ID? Making it easier to handle cancellations? For verification requests we would want a received timestamp? We could also put these in a separate table?

@clokep
Copy link
Member Author

clokep commented Aug 11, 2023

We could keep track of which rows we've already sent down to clients, but that feels like extra bookkeeping i'd rather avoid.

I think from the server's POV we can't tell if we've only sent things to a client or if they've actually processed them, so we just might not care?

It might be easier if we extend the table to pull out some of the information in the JSON so that we can efficiently look up key requests by request ID? Making it easier to handle cancellations? For verification requests we would want a received timestamp? We could also put these in a separate table?

I think we probably do, but for an initial version I was going to pull the fields out of JSON. (It might not be able to land this way, but I think it'll let us get the rest of the mechanics down.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

No branches or pull requests

3 participants