-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
completely broken room after purge_room/rejoin #11521
Comments
Of course, if we had foreign key constraints, the failure mode would be a lot less insidious :/ |
Cache invalidation in this case is a bit tricky, because we don't actually know which events we're purging, and most of the caches in question aren't structured in a way that makes it easy to purge them by room id. |
|
I see that one (only?) affected cache is I'm not sure how this change will affect external cache support, though I don't think that's merged yet, so a migration would not yet be needed. Edit: the downside is that you'd need to know the room ID when checking the cache to get an efficient lookup. |
An alternative could be to maintain a separate dict cache of |
I've made good headway into converting However, it is not the only event-related cache we need to deal with here. Both I understand Having a second layer which keeps entries even after they've been removed from the cache just seems like a recipe for races. |
The PRs that introduced
and they help in understanding why these caches were implemented. We need to invalidate these by room ID as well, though this is tricky and raises some question. Unfortunately neither of these caches are a strict subset of the contents of If not, and thus we can't re-use the work from invalidating one cache, then that leaves the solution of having yet more room ID -> event ID maps that are maintained (one for each cache) or simply iterating over all entries, grabbing the room ID and invalidating those that match. That may not be very performant, though I haven't done any performance testing to verify. I'd like to say purging a room doesn't happen very often, but it will if we introduce purge-room-on-forget (#4720, the feature driving this investigation). I'm also not sure if invalidating @erikjohnston I'm interested in your thoughts specifically, since you authored much of this code. |
@anoadragon453 I haven't read this thread but maybe you're interested in what the synapse/synapse/storage/databases/main/events_worker.py Lines 1527 to 1530 in efabf44
You can see that we invalidate via synapse/synapse/storage/databases/main/purge_events.py Lines 304 to 306 in 2b522cc
Found while tooling around on #13856 and thought you might be interested. |
Hey @MadLittleMods, thanks for the suggestion! Unfortunately |
when you purge a room (via the delete room api), we do not clear the in-memory caches. Then, when you rejoin, we have a bunch of the events in the cache, so do not bother to re-persist them.
This leads to very bad brokenness, like
state_groups_state
referring to events which do not exist.(A workaround is of course to restart synapse after purging and before rejoining)
The text was updated successfully, but these errors were encountered: