Description
TLDR: To determine if you are affected by this problem, run the following query:
select room_id, count(*) c from event_forward_extremities group by room_id order by c desc limit 20;
Any rows showing a count of more than a handful (say 10) are cause for concern. You can probably gain some respite by running the query at #1760 (comment).
Whilst investigating the cause of heap usage spikes in synapse, correlating jumps in RSZ with logs showed that 'resolving state for !curbaf with 49 groups' loglines took ages to execute and would temporarily take loads of heap (resulting in a permenant hike in RSZ, as python is bad at reclaiming heap).
On looking at the groups being resolved, it turns out that these were the extremities of the current room, and whenever the synapse queries the current room state, it has to merge these all together, whose implementation is currently very slow. To clear the extremities, one has to talk in the room (each message 'heals' 10 extremities, as max prev-events for a message is 10).
Problems here are:
- Why are we accumulating so many extremities? I assume it's whenever there's some downtime the graph breaks, leaving a dangling node.
- Is there a way to stop them accumulating by healing or discarding them on launch (e.g. by sending a null healing event into the room)?
- Why is state resolution so incredibly heavy? There should hardly be any conflicting state here, unless the bifurcation has been going on for months. Is it because to auth potential conflicts we have to load all auth events, which include every m.room.member
- Logs of a state resolution happening from arasphere at DEBUG show lots of thrashing on the rejections table too.
- We're also seeing ominous pauses in the logging of requests which resolve state, as if there's some lock we're contending for. (This might be the same as Ralith's HS OOMed after repeatedly loading events in get_current_user_in_room #1774)
- Can we just insert dummy nodes in our local copy of the DAG after doing a successful state resolution, to avoid having to constantly re-calculate it or rely on naive caching?