Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unable to join some rooms on matrix.org #13853

Closed
593kdk32idk32 opened this issue Sep 20, 2022 · 10 comments
Closed

Unable to join some rooms on matrix.org #13853

593kdk32idk32 opened this issue Sep 20, 2022 · 10 comments
Labels
A-Federated-Join joins over federation generally suck O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@593kdk32idk32
Copy link

Description

This error happens on an IPv4 only system. There is no IPV6 available, not even localhost. I believe this error is triggered when trying to join or communicate with rooms that are on servers with both IPv4 and IPv6.

The issue was discovered while trying to figure out why I can't join #matrix:matrix.org. The log below happened right as attempting to join #matrix:matrix.org failed, but I can't gurantee they are related.

Steps to reproduce

Simply run Synapse on a system without IPV6. Here's an example of my homeserver.yaml file.
listeners:

  • port: 8008
    tls: false
    type: http
    x_forwarded: true
    bind_addresses: ['172.16.10.84']
    resources:
    • names: [client, federation]
      compress: false

Homeserver

(Private server)

Synapse Version

{"server_version":"1.67.0","python_version":"3.10.7"}

Installation Method

pip (from PyPI)

Platform

The operating system is FreeBSD 13.1. Synapse is run in a custom chroot and listens on a localhost bound IP. Synapse is proxied by Nginx.

Relevant log output

2022-09-20 16:52:40,494 - twisted - 274 - CRITICAL - sentinel - Unhandled error in Deferred:
2022-09-20 16:52:40,495 - twisted - 274 - CRITICAL - sentinel - 
Traceback (most recent call last):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 205, in maybeDeferred
    result = f(*args, **kwargs)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 1045, in iterateEndpoint
    eachAttempt = endpoint.connect(protocolFactory)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 647, in connect
    if isIPv6Address(self._host):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 539, in isIPv6Address
    return isIPAddress(addr, AF_INET6)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 510, in isIPAddress
    addr = addr.split("%", 1)[0]
AttributeError: 'int' object has no attribute 'split'
2022-09-20 16:52:40,589 - synapse.federation.transport.server.federation - 103 - INFO - PUT-21221 - Received txn 1663675468396 from matrix.org. (PDUs: 1, EDUs: 1)
2022-09-20 16:52:40,693 - synapse.access.http.8008 - 450 - INFO - GET-21219 - 176.126.240.158 - 8008 - {matrix.org} Processed request: 0.200sec/0.105sec (0.002sec, 0.000sec) (0.001sec/0.001sec/1) 25B 200 "GET /_matrix/federation/v1/query/profile?user_id=%40username%3Amatrix.example.com&field=displayname HTTP/1.0" "Synapse/1.68.0rc1 (b=matrix-org-hotfixes,f4419438c1)" [0 dbevts]
2022-09-20 16:52:40,694 - synapse.handlers.receipts - 77 - INFO - PUT-21221 - Ignoring receipt for room '!OGEhHVWSdvArJzumhm:matrix.org' from server matrix.org as we're not in the room
2022-09-20 16:52:40,799 - synapse.access.http.8008 - 450 - INFO - GET-21222 - 176.126.240.158 - 8008 - {matrix.org} Processed request: 0.202sec/0.104sec (0.001sec, 0.000sec) (0.000sec/0.024sec/1) 25B 200 "GET /_matrix/federation/v1/query/profile?user_id=%40username%3Amatrix.example.com&field=displayname HTTP/1.0" "Synapse/1.68.0rc1 (b=matrix-org-hotfixes,f4419438c1)" [0 dbevts]
2022-09-20 16:52:40,799 - synapse.federation.federation_server - 445 - INFO - PUT-21221 - Ignoring PDU for unknown room_id: !OGEhHVWSdvArJzumhm:matrix.org
2022-09-20 16:52:41,004 - synapse.access.http.8008 - 450 - INFO - PUT-21221 - 176.126.240.157 - 8008 - {matrix.org} Processed request: 0.410sec/0.102sec (0.004sec, 0.000sec) (0.002sec/0.027sec/3) 11B 200 "PUT /_matrix/federation/v1/send/1663675468396 HTTP/1.0" "Synapse/1.68.0rc1 (b=matrix-org-hotfixes,f4419438c1)" [0 dbevts]
2022-09-20 16:52:41,519 - synapse.access.http.8008 - 450 - INFO - GET-21223 - 176.126.240.158 - 8008 - {matrix.org} Processed request: 0.213sec/0.094sec (0.001sec, 0.000sec) (0.000sec/0.006sec/1) 19B 200 "GET /_matrix/federation/v1/query/profile?user_id=%40username%3Amatrix.example.com&field=avatar_url HTTP/1.0" "Synapse/1.68.0rc1 (b=matrix-org-hotfixes,f4419438c1)" [0 dbevts]
2022-09-20 16:52:41,624 - synapse.access.http.8008 - 450 - INFO - GET-21224 - 176.126.240.158 - 8008 - {matrix.org} Processed request: 0.201sec/0.104sec (0.001sec, 0.000sec) (0.001sec/0.006sec/1) 19B 200 "GET /_matrix/federation/v1/query/profile?user_id=%40username%3Amatrix.example.com&field=avatar_url HTTP/1.0" "Synapse/1.68.0rc1 (b=matrix-org-hotfixes,f4419438c1)" [0 dbevts]
2022-09-20 16:52:41,625 - synapse.federation.transport.server.federation - 103 - INFO - PUT-21225 - Received txn 1658350968685 from aeum.net. (PDUs: 0, EDUs: 1)
2022-09-20 16:52:41,625 - synapse.federation.transport.server.federation - 103 - INFO - PUT-21226 - Received txn 1662387295053 from zougloub.eu. (PDUs: 0, EDUs: 1)
2022-09-20 16:52:41,626 - twisted - 274 - CRITICAL - sentinel - Unhandled error in Deferred:
2022-09-20 16:52:41,627 - twisted - 274 - CRITICAL - sentinel - 
Traceback (most recent call last):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 205, in maybeDeferred
    result = f(*args, **kwargs)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 1045, in iterateEndpoint
    eachAttempt = endpoint.connect(protocolFactory)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 647, in connect
    if isIPv6Address(self._host):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 539, in isIPv6Address
    return isIPAddress(addr, AF_INET6)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 510, in isIPAddress
    addr = addr.split("%", 1)[0]
AttributeError: 'int' object has no attribute 'split'

Anything else that would be useful to know?

I'm happy to do any testing, patching, or other work to help.

@H-Shay
Copy link
Contributor

H-Shay commented Sep 20, 2022

Thanks for reporting, a few questions: are you able to join any other rooms on matrix.org? What happens when the join fails-do you see an error message in the client, etc?

@H-Shay H-Shay added the X-Needs-Info This issue is blocked awaiting information from the reporter label Sep 20, 2022
@593kdk32idk32
Copy link
Author

Hello,

The client is the Element flatpack on Fedora 36. The GUI error is.

Failed to join
Failed to send_join via any server
or
Failed to join
There was an error joining.

I can join most rooms, but not others. For example, both #matrix:matrix.org, and #python:matrix.org are not working. But #fediverse:pixie.town and #FreeBSD:matrix.org work fine.There's no errors in the Nginx logs, or timeouts that I can see. Other than the posted error log here, I don't see anything wrong in the homeserver.log.

@H-Shay
Copy link
Contributor

H-Shay commented Sep 20, 2022

Interesting-I don't see the send_join failure in the logs. Would you be able to provide (ideally DEBUG) logs showing from a little before the initial join request until after the failure? It would be helpful for getting a better sense of what's happening.

@H-Shay H-Shay changed the title CRITICAL - sentinel - Unhandled error in Deferred: Uanble to join some rooms on matrix.org Sep 20, 2022
@593kdk32idk32
Copy link
Author

Hello,

Yes, is there a place to send the full log other than a public ticket? Here's at least the snippets that reference send_join.

Note, I don't have the previous showed IPV6 error anywere in this log, so my guess that was the issue seems wrong.


2022-09-21 00:43:41,374 - synapse.http.server - 456 - DEBUG - sentinel - Registering for PUT ^/_matrix/federation/v1/send_join/(?P<room_id>[^/]*)/(?P<event_id>[^/]*)$
 
2022-09-21 00:43:41,374 - synapse.http.server - 456 - DEBUG - sentinel - Registering for PUT ^/_matrix/federation/v2/send_join/(?P<room_id>[^/]*)/(?P<event_id>[^/]*)$

2022-09-21 00:45:31,719 - synapse.http.matrixfederationclient - 570 - DEBUG - POST-187 - {PUT-O-8} [matrix.org] Sending request: PUT matrix://matrix.org/_matrix/federation/v2/send_join/%21OGEhHVWSdvArJzumhm%3Amatrix.org/%24_9hRDKO3MuPOly1ZNBriAhDCwsMyJ_NNjPzEYWfmk3s; timeout 60.000000s

2022-09-21 00:46:31,720 - synapse.http.matrixfederationclient - 672 - INFO - POST-187 - {PUT-O-8} [matrix.org] Request failed: PUT matrix://matrix.org/_matrix/federation/v2/send_join/%21OGEhHVWSdvArJzumhm%3Amatrix.org/%24_9hRDKO3MuPOly1ZNBriAhDCwsMyJ_NNjPzEYWfmk3s: ResponseNeverReceived:[CancelledError()]

2022-09-21 00:46:32,379 - synapse.http.matrixfederationclient - 570 - DEBUG - POST-187 - {PUT-O-8} [matrix.org] Sending request: PUT matrix://matrix.org/_matrix/federation/v2/send_join/%21OGEhHVWSdvArJzumhm%3Amatrix.org/%24_9hRDKO3MuPOly1ZNBriAhDCwsMyJ_NNjPzEYWfmk3s; timeout 60.000000s

2022-09-21 00:47:32,382 - synapse.http.matrixfederationclient - 672 - INFO - POST-187 - {PUT-O-8} [matrix.org] Request failed: PUT matrix://matrix.org/_matrix/federation/v2/send_join/%21OGEhHVWSdvArJzumhm%3Amatrix.org/%24_9hRDKO3MuPOly1ZNBriAhDCwsMyJ_NNjPzEYWfmk3s: ResponseNeverReceived:[CancelledError()]

2022-09-21 00:47:33,343 - synapse.http.matrixfederationclient - 570 - DEBUG - POST-187 - {PUT-O-8} [matrix.org] Sending request: PUT matrix://matrix.org/_matrix/federation/v2/send_join/%21OGEhHVWSdvArJzumhm%3Amatrix.org/%24_9hRDKO3MuPOly1ZNBriAhDCwsMyJ_NNjPzEYWfmk3s; timeout 60.000000s

@H-Shay
Copy link
Contributor

H-Shay commented Sep 21, 2022

You can DM me the full logs at @shayshay:matrix.org.

@593kdk32idk32 593kdk32idk32 changed the title Uanble to join some rooms on matrix.org Unable to join some rooms on matrix.org Sep 22, 2022
@richvdh
Copy link
Member

richvdh commented Sep 22, 2022

This stacktrace is weird:

2022-09-20 16:52:41,627 - twisted - 274 - CRITICAL - sentinel - 
Traceback (most recent call last):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 205, in maybeDeferred
    result = f(*args, **kwargs)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 1045, in iterateEndpoint
    eachAttempt = endpoint.connect(protocolFactory)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/endpoints.py", line 647, in connect
    if isIPv6Address(self._host):
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 539, in isIPv6Address
    return isIPAddress(addr, AF_INET6)
  File "/apps/synapse/lib/python3.10/site-packages/twisted/internet/abstract.py", line 510, in isIPAddress
    addr = addr.split("%", 1)[0]
AttributeError: 'int' object has no attribute 'split'

It suggests that somehow we're constructing a TCP6ClientEndpoint with _host set to an int... which in turn suggests that the resolver is returning a IPv6Address with host set to an int. Could be something to do with NetBSD? Or something to do with our SRV record resolver?

However, I suspect the reason you can't join !OGEhHVWSdvArJzumhm:matrix.org is unrelated, but is just #1211.

@593kdk32idk32
Copy link
Author

593kdk32idk32 commented Sep 22, 2022

It suggests that somehow we're constructing a TCP6ClientEndpoint with _host set to an int... which in turn suggests that the resolver is returning a IPv6Address with host set to an int. Could be something to do with NetBSD? Or something to do with our SRV record resolver?

Here's the DNS records as seen from my matrix instance.
matrix.org has address 104.20.200.37
matrix.org has address 104.20.201.37
matrix.org has address 172.67.19.90
matrix.org has IPv6 address 2606:4700:10::6814:c825
matrix.org has IPv6 address 2606:4700:10::ac43:135a
matrix.org has IPv6 address 2606:4700:10::6814:c925

I'm not sure about any SRV records. The system is FreeBSD, not NetBSD, though I don't know if that makes a difference.

In regards to #1211, my issue is not that it's slow, but that's it's impossible. It's not just the main matrix room this happens to. Are there any settings to increase the timeouts to allow the rooms to load?

@DMRobertson
Copy link
Contributor

The timeout looks to be 60 seconds on the joiner's side:

timeout: number of milliseconds to wait for the response.
self._default_timeout (60s) by default.

Not currently configurable, but it's read from here:

self.default_timeout = 60

You could try bumping this 60 to a larger number---but it's really a bodge around that fact that remote room joins are currently very expensive.

@DMRobertson DMRobertson added A-Federated-Join joins over federation generally suck S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience O-Occasional Affects or can be seen by some users regularly or most users rarely and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Sep 27, 2022
@593kdk32idk32
Copy link
Author

Bumping self.default_timeout to 600 lets me join all the rooms. The client still throws an error and timeouts, but the room show up at a later time regardless.

@DMRobertson
Copy link
Contributor

Bumping self.default_timeout to 600 lets me join all the rooms. The client still throws an error and timeouts, but the room show up at a later time regardless.

Thanks, I think that confirms that #1211 is the underlying cause. The in-progress faster remote room joins project should help here in the coming months. (https://github.com/matrix-org/synapse/milestone/8, https://github.com/matrix-org/synapse/milestone/10)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federated-Join joins over federation generally suck O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants