-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Incorrect HTML responses when providing possibly non existent matrix identifiers #15392
Comments
Is this only an issue with the matrix.org deployment? I'm pretty sure we no longer return HTML error pages inside of Synapse itself. |
I tried that just under |
Right, do also see this on element.io or a test server? |
Tried the get profile on element.io and the response looks good: HTTP 502 - Bad gateway
In case of the create room: HTTP 502 - Bad gateway
(and the room created anyway afterwards) |
It'd be worth noting that we did have a chat about this in a meeting recently. Whilst it's undeniably CloudFlare's fault for making a HTML error page from 502s, this can't be turned off as far as we know. For the more severe case of creating a room and failing to invite a user, then we currently half-create the room then return a 502 error. Regardless of this error being HTML or JSON, a reasonable client would try to re-create the room. Possibly we should just fully create the room and ignore the error. |
Oh -- is the complaint that Synapse returns 502 errors which Cloudflare turns into HTML pages? |
That's half of it, which I think we do want to consider (possibly at a spec level :/); one suggestion might be to designate another error code as 'a remote homeserver couldn't answer'. The other half of the issue (room creation) is likely related to #8895: we half-create the room but then lie and tell the client the room creation failed (causing any good client to retry, particularly for a 502 which sounds like a reverse proxy-generated error). |
I'm not sure it makes sense to change Synapse's behaviour and certainly not the Matrix spec merely due to dubious behaviour by third parties. It may be more worthwhile to ensure clients are robust enough to not choke on unexpected or invalid response bodies. (In this case since the error code is still correct, it seems like a minor issue) |
On the topic of invitation failures during room creation: It's been noted that rolling on with the room creation and returning a 200 (which is the only current response that informs the client about the created room) would be no better (particularly in DMs), since the end result would be the application accepting messages it has no hope of delivering. Whilst it's true that the broken rooms created today can also accept messages, either the user will see some messy 502 error at creation (unfortunately not specific enough to understand), or with the retry scheme: the appearance of many rooms, that they would notice that something is wrong/unreliable/suspicious. That sets the requirements for an improvement to be something like this:
Given that, we'd likely want to spec something new here to represent a partial success. Here's a rough outline for what I'd do for this exact case, which shouldn't be much server-side work:
As a future extension, it would be good to consider how we can represent a total failure caused by an invitation failure. What I mean is: if every single invite fails (e.g. it's a DM and the only invite fails), it'd be good if Synapse was allowed to destroy the room since it was never created usefully in the first place. Backwards compatibility concerns: I'd expect existing clients to treat 600 or the other 'total failure' error as error conditions and do the same thing they do today. But such a scheme would give them the chance to give the user clearer error messages or even friendlier error handling. Today, clients have no choice with the information they receive. I'd appreciate feedback on the above scheme. Ideally we can keep this light enough so it's doable without a lot of fuss (otherwise it'll never get done). Introducing a more rigorous error framework to Matrix is somewhat of a wish of mine but it'd be too heavyweight for this. For everything else: the 502s being swallowed by CloudFlare are a pain, but if they don't cause any real pain then the client handling should be vaguely similar. Maybe we can consider changing error code if it would be useful (this would be a different MSC to the one above), but if you make a |
I think transaction rollback would be the most ideal solution. I'm guessing it's a bit harder to rollback here since invites to remote users go over federation? Can we consider certain invite failures to be completely dead in the water? For example, if we get back a non-successful status code for all invites or fail to connect to all remote servers, then we can rollback. We can't make these guarantees for no response or a timeout though as the server could have received the message and will process it later. In those cases, we would still end up needing some partial success like you're proposing anyway but we could be seamless in a lot of cases 🤷 Rollback is close to my heart because of problems like #15005 where we get a half-baked room and an alias pointing back at it from |
Yeah, I agree there. Without having looked into it deeply, I'd say I expect it to be quite tricky to do, though — possibly made easier by the new batch event persistence though? |
Description
Calling APIs with (possibly) non existent matrix identifiers (like:
@alfotest:some
) triggers non json responses from the home server with HTTP errors 502.Examples of such APIs are:
@alfotest:some
]) since even if the clients gets back an HTTP 502 and a HTML response, the room creation actually succeeds.Steps to reproduce
Homeserver
matrix.org
Synapse Version
{"server_version":"1.80.0 (b=matrix-org-hotfixes,ab0a5f1972,dirty)","python_version":"3.8.12"}
Installation Method
I don't know
Database
I don't know
Workers
I don't know
Platform
I don't know
Configuration
I don't know
Relevant log output
Anything else that would be useful to know?
No response
The text was updated successfully, but these errors were encountered: