Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

POST https://matrix-client.matrix.org/_matrix/client/r0/join/%23libera%3Alibera.chat errors with 502 #14596

Open
progval opened this issue Dec 2, 2022 · 23 comments
Labels
A-Federated-Join joins over federation generally suck O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@progval
Copy link
Contributor

progval commented Dec 2, 2022

Description

It is impossible to join #libera:libera.chat from matrix.org

POST requests to https://matrix-client.matrix.org/_matrix/client/r0/join/%23libera%3Alibera.chat error with Cloudflare's standard 502

I don't know how long this has been going on, but people have been complaining about it in #irc:matrix.org and #libera-matrix:libera.chat for about a month.

Steps to reproduce

  • Try to join #libera:libera.chat from matrix.org

Homeserver

matrix.org

Synapse Version

1.73.0rc2 (b=matrix-org-hotfixes,34fa1276a4)

Installation Method

I don't know

Database

postgresql

Workers

I don't know

Platform

n/a

Configuration

No response

Relevant log output

n/a

Anything else that would be useful to know?

No response

@clokep
Copy link
Member

clokep commented Dec 2, 2022

Probably a duplicate of #14462?

@DMRobertson
Copy link
Contributor

Or #14480

@DMRobertson
Copy link
Contributor

It is impossible to join #libera:libera.chat from matrix.org

Are you able to join this room from another HS?

@progval
Copy link
Contributor Author

progval commented Dec 2, 2022

I just tried on envs.net. First, Element showed a spinner forever, so I reloaded the page. Now I get {"errcode":"M_UNKNOWN","error":"Internal server error"}.

@iakat
Copy link
Contributor

iakat commented Dec 2, 2022

Getting the same error from my HS, which has no issue federating.

@Diablo-D3
Copy link

This is also affecting ##tea

@apos0
Copy link

apos0 commented Dec 6, 2022

Can't join #hardware #libera #bash from neither data.haus nor matrix.org but somehow #ffmpeg works ok

@realtyem
Copy link
Contributor

realtyem commented Dec 6, 2022

I'm wondering if this is related to the 502 errors that have been seen around #14103 lately?

@apos0
Copy link

apos0 commented Dec 6, 2022

just for reference i was able to join the channels. They just appeared in my list after many hours. Very weird. Maybe libera servers are overloading?

@schickling
Copy link

Also affects #metabrainz

@holdenger
Copy link

Probably affects #fedora-cs too.

@reivilibre reivilibre self-assigned this Dec 8, 2022
@reivilibre
Copy link
Contributor

reivilibre commented Dec 12, 2022

I had a look at the logs for matrix.org and libera.chat. Shay made an attempt to join on the 12th and this is what happened:

matrix.org

  • 19:41:36,978: the /join request was routed to event_creator_users2 (POST-77368a818d0e645c-SJC)
  • 19:41:37,126: the remote join request arrived at the master process (POST-6133673)
    • a /make_join completes within 600ms against libera.chat
    • a /send_join request is started
    • ... but then times out at 19:42:37,730
    • (there are then 3 retries; the final one times out at 19:45:41,293)

libera.chat

  • 19:41:40,592: first sign of something happening with the send_join: Received response to POST http://event-persister-repl:9092/_synapse/replication/fed_send_events/oaOXaVacsy: 200 (PUT-5881, federation reader 0)
    • this is only ~4 seconds after the /join request was made by the person on matrix.org, so what went wrong? ...
    • 19:42:37,739: Connection from client lost before response was sent
    • the request finally completes at 20:01:27,582: Processed request: 1189.458sec/-1129.843sec (17.792sec, 0.652sec) (0.071sec/95.973sec/17) 0B 200! ... [20 dbevts]
      • 95 s of DB time is not great, 18 s of CPU time is not great ... but it's not obvious where the 1189 s of wall clock time is going??

(Note: all the retries are a little bit faster (likely caching); but still the fastest was 400 sec — not fast enough to respond in time.)
(Another note: one of the retries says PUT-5944- Ratelimiter(matrix.org): queueing request (queue now 9 items) — I wonder if we're harming response times because of rate limiting? From the logs it's not clear how long the request is delayed for, though.)

Conclusion though: the Libera server isn't responding in time.
With that said: the CPU and DB time graphs for federation reader 0 on Libera look very quiet — struggling to believe this is being overloaded. Similarly, the event persister looks quiet at that point in time — I don't expect that the issue is waiting for the join to be persisted.

summary:

  • The Libera server isn't responding in time.
  • I can't see why that is — the Synapse workers look pretty quiet at the time in question — if the issue was the resources needed to process the request, I'd expect to be able to see something being overloaded.

edit: Rich suspects the ratelimiter isn't working correctly.

@reivilibre reivilibre removed their assignment Dec 15, 2022
@DMRobertson
Copy link
Contributor

For xrefs: we suspect #14480 is related, if not the cause

@H-Shay H-Shay added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. labels Jan 9, 2023
@erikjohnston
Copy link
Member

Le'ts re-investigate once libera.chat is on 1.75.0?

@quite
Copy link

quite commented Jan 20, 2023

Is this related to the failed bridging setup that happens towards various libera-channels? Like in: matrix-org/matrix-appservice-irc#1652

@DMRobertson
Copy link
Contributor

Le'ts re-investigate once libera.chat is on 1.75.0?

This should be done now---need to investigate if these problems are still occurring.

@ht990332
Copy link

https://app.element.io/?updated=1.11.19#/room/#libera:libera.chat is still unable to join #libera.

@quite
Copy link

quite commented Jan 26, 2023

Is this related to the failed bridging setup that happens towards various libera-channels? Like in: matrix-org/matrix-appservice-irc#1652

No change for this issue. That issue is still with Scalar it seems.

@Diablo-D3
Copy link

##tea is now free from this bug.

@erikjohnston
Copy link
Member

If people still see this, can you open a new issue and include logs. Thanks!

@ara4n ara4n reopened this Feb 21, 2023
@ara4n
Copy link
Member

ara4n commented Feb 21, 2023

#15115 is closely related to this. It wasn’t libera timing out tho in that instance

@JeanPaulLucien
Copy link

JeanPaulLucien commented Feb 23, 2023

Maybe these issues are related too.
element-hq/element-web#24617 -> #15145
#15142

@JeanPaulLucien
Copy link

JeanPaulLucien commented Feb 23, 2023

It is impossible to join #libera:libera.chat

It's possible to join: element-hq/element-web#24482
I've collected the various errors from libera.chat. Seems it's random.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests