-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve/fix application service transaction batching. #8604
Comments
I'm wondering if this actually helps though? Presumably sending batches of N size is also going to delay the appservice as soon as it hits some events it needs to do a lot of work on. To be honest, I'd rather appservices didn't block transaction processing, as we don't need to send the result of the processing back to the homeserver. The AS should really keep accepting new transactions in quick succession regardless of the event contents. We should formally spec a max size of events though, because most appservices do limit the byte size of the payload.
This seems a bit silly yep, but feels like a Synapse problem (not a spec, the spec doesn't have opinions about retry behavior) where it could be more efficient about it's batching. You can't change the size of a transaction once you've attempted to send it, but subsequent transactions that are pending are mutable so you could keep extending the next transaction to be sent up to MAX_SIZE. |
These are implementation details and unrelated to the spec - moving to the synapse repo. |
Woops! I really meant it to be in the Synapse repo :/ |
The problem is that you end up with a delay added on top of however long the AS takes to process an event. This delay is then never recovered from as the AS keeps struggling with large transactions (and can end up in a situation where it gets further and further behind because of larger and larger transactions). We've seen this reported in the wild. If the AS returns a 200 OK immediately then a) you lose reliability if the AS dies/is restarted and b) there is no longer any back pressure if the AS starts falling behind, and Synapse will just continue spamming the AS with events until it falls over in a heap.
There is a max size of events, its just that doesn't help if we don't bound the max number of events in a transaction |
Ah, I did review this with a spec hat on so my comments may have been tainted with a spec side view - sorry :s.
Hmm, I guess smaller transactions on recovery allows the AS to at least report to the HS that it's making progress, rather than flopping over because it didn't complete the entire transaction. That's a fair point. I would like to see what the AS could do to limit work done per event. There's no point blocking transactions while the bridge is waiting for the HS to finish a join or a leave, for example.
Misspoke, I agree with this :). |
at least half of this is a duplicate of #3536 |
#3536 is now fixed (we decided on a fixed 100 events per transaction) which solves concern 1. |
@erikjohnston to clarify the remaining work here |
Split out remaining work to #8704 |
The following two things come up semi regularly:
The text was updated successfully, but these errors were encountered: