This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Better handling of (reverse) connection issues in federation requests #15279
Labels
A-E2EE
End-to-end encryption for Matrix clients
A-Federation
O-Occasional
Affects or can be seen by some users regularly or most users rarely
S-Minor
Blocks non-critical functionality, workarounds exist.
T-Task
Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Z-Help-Wanted
We know exactly how to fix this issue, and would be grateful for any contribution
Scenario
To start this of I'll just describe the scenario I just went through:
In an encrypted group chat the messages of a few people, all with matrix.org accounts, were suddenly not able to be decrypted.
The client lets it seem like it's a local issue, and maybe another client still needs to push the keys, but since all my clients showed the same message it was clear that this is not the case.
Looking at the server log I found a lot of lines similar to this:
No further information is given. Just a
401 Unauthorized
, even with DEBUG logging enabled.I went ahead and figured out the parameters of the request to send it myself, but that only got me to another unclear error message:
From a few other issues I found that people were having similar issues when their TLS certificates weren't valid, but that wasn't the case here.
But that got me to think that perhaps there actually is some type of connection issue, and yea, after some digging with tcpdump I found that the incoming TCP session from the matrix.org server wasn't fully established. It seemed to be a routing issue in this case, where the connectivity between Mythic Beasts and one of my transit providers was somehow broken, and after disabling those routes everything worked perfectly.
I probably spent a few hours debugging this, because there was absolutely no indication of what was actually going wrong. That was really annoying.
Suggestion
When federation requests are sent out and a key lookup in the reverse direction fails it would be great to have additional information delivered with that error.
Something like...
...which then also gets printed to the server logs instead of the simple "401 Unauthorized" would be great.
The error messages could be really helpful, here are some more examples:
That would make debugging those kind of issues a lot easier. With $servername:$port given in the details you would also see if it connects to the right server, which sometimes seems to be an issue with people hosting their servers on a subdomain.
I really hope this gets implemented as this seems like a very important debugging feature in a federated system.
The text was updated successfully, but these errors were encountered: