-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Federated joins failing on 0.19.3 #1893
Comments
So I have the same problem now. I joined #megolm:matrix.org at 2017-03-29 22:21:12 UTC, and now almost 4 days later people still don't see I'm in the room. I also still get the same error message when using riot on android, but riot-web doesn't seem to show an error. I'm using synapse 0.19.3. I've saved the log of that day if it's useful. |
@kroeckx sent me the full logs from that day, and here's the request which spectacularly fails. The first server it tries to call /make_join isn't even alive (although i'm assuming the new federation retry fixes in 0.20 will solve that), but then the request fails entirely when it tries to talk use kolm.io for the make_join. So: why doesn't it retry further? And why does kolm.io's signature check fail? @richvdh this feels similar to the join problems you were looking at the other week; is this related or a new issue?
|
Yes, there is more reuse of the 'backoffs' list in 0.20, which means that it is less likely to pick a dead server for a join. In this case it's not much of a problem because it switches to another server within 300ms.
It doesn't use
That is almost certainly due to #2034 (also fixed in 0.20). It shouldn't be a significant problem here. So from the point of view of roeckx.be, this looks like a successful join. The question is why that join didn't get propagated to other servers in the federation. |
On sw1v.org:
The join event is rejected. I have a horrid feeling we might have broken joining over federation in 0.20. |
Hum, no, the join event is supposed to come from the joining server rather than the helper, so it's correct that sw1v.org rejected the join when it came from darmstadt.ccc.de. |
So it looks like joining rooms over federation has been somewhat broken ever since synapse 0.18.7, which included commit e10c527. When a join happens over federation, the joining server creates a join event, and sends it to another server; it is that other server which is responsible for sending out the join event to the rest of the federation. The reason it's done that way round are slightly shrouded in the mysteries of time, but the point is, e10c527 broke it, so that all the other servers in the federation will reject the join, and only the two servers involved know that the new user has joined the room. We've mostly been getting away with this because there's a high chance that the "other" server involved is matrix.org, which means that most other users know that the user has joined, and there's enough traffic in the room to trigger the "missing state" resolution mechanism. In this case, the problem is exacerbated by the fact that we have chosen The fix is relatively easy: change the code affected in e10c527 to accept these join events. Once a few servers in the federation are updated, new join events should at least get propagated to those servers, and hopefully that should be enough to get traffic flowing and the missing-state mechanism will get the join to other servers. |
(it also looks like sending a message in the room is enough to make the join propagate to other servers, after a few minutes). |
Make sure that we accept join events from any server, rather than just the origin server, to make the federation join dance work correctly. (Fixes #1893).
Fixed by #2094, hopefully |
I've joined
#riot:matrix.org
over 24 hours ago, but other home servers are still not showing it. After a few hours I tried to leave and join again, and the other servers seem to show that I joined and leaved at that time, so still didn't join. I didn't have this problem with joining any other room. The account is@kurt:roeckx.be
.When I open that room in riot I get:
Matrix error: Forbidden
.The text was updated successfully, but these errors were encountered: