Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Idempotent transaction send requests should be coalesced & duplicate suppressed #1742

Closed
ara4n opened this issue Dec 31, 2016 · 3 comments
Closed
Labels
A-Federation z-p2 (Deprecated Label)

Comments

@ara4n
Copy link
Member

ara4n commented Dec 31, 2016

The problems in #1733 are somewhat exacerbated by the origin HS retrying the send txn 10 times over the course of 20 minutes. If the problem is simply that the txn is taking a long time to process (due to the lineariser lock in #1729), then we should surely just coalesce these requests together and return the same result when they all complete, rather than having them stack up.

@ara4n ara4n changed the title Idempotent transaction send requests should be coalesced Idempotent transaction send requests should be coalesced & duplicate suppressed Dec 31, 2016
@ara4n
Copy link
Member Author

ara4n commented Jan 5, 2017

A good example of this bad behaviour is shown here from @Ralith's HS:

sierra:ralith matthew$ cat synapse-pegged-2.log | grep '_matrix/federation/v1/send/1483544392748'
Jan 04 20:44:35 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881047] 37.59.43.212 - 8008 - Received request: PUT /_matrix/federation/v1/send/1483544392748/
Jan 04 20:50:31 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881181] 37.59.43.212 - 8008 - Received request: PUT /_matrix/federation/v1/send/1483544392748/
Jan 04 20:50:32 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881212] 37.59.43.212 - 8008 - Received request: PUT /_matrix/federation/v1/send/1483544392748/
Jan 04 20:50:37 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881270] 37.59.43.212 - 8008 - Received request: PUT /_matrix/federation/v1/send/1483544392748/
Jan 04 20:51:26 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-880923] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 600520ms (110747ms, 286ms) (389ms/3) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:51:27 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881270] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 50076ms (7ms, 0ms) (0ms/0) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:51:28 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881212] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 55766ms (13ms, 0ms) (0ms/0) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:51:28 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-880954] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 590209ms (110263ms, 319ms) (168ms/3) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:51:29 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881047] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 413523ms (103655ms, 297ms) (315ms/3) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:51:31 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881181] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 59724ms (614ms, 33ms) (345ms/7) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"
Jan 04 20:52:14 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881378] 37.59.43.212 - 8008 - Received request: PUT /_matrix/federation/v1/send/1483544392748/
Jan 04 20:52:14 ralith.com synapse[2317]: synapse.access.http.8008: [PUT-881378] 37.59.43.212 - 8008 - {safaradeg.net} Processed request: 748ms (7ms, 1ms) (12ms/1) 48B 200 "PUT /_matrix/federation/v1/send/1483544392748/ HTTP/1.1" "Synapse/0.18.6-rc2"

The federation rate limiter lets us execute 3 of these transactions in parallel. They stack up (due to requiring to resolve state for a 23-prev-event event from #rust), and slow each other down catastrophically. The others block, and then eventually execute relatively quickly thanks to the result of the merge resolution being cached.

Needless to say, we should never have been executing these in parallel in the first place.

@ara4n
Copy link
Member Author

ara4n commented Jan 5, 2017

it also looks like the first request will only terminate once the other two processing in parallel finish, which consumes resources and slows things down even more (and causes the spurious final retry from the origin HS). I'm unclear on what lock is responsible for that.

@ara4n ara4n added federation-meltdown z-p2 (Deprecated Label) labels Jan 5, 2017
@richvdh
Copy link
Member

richvdh commented Oct 10, 2017

fixed (at last) by #2518.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants