Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unable to connect to homeserver, retrying... - Lost access to homeserver on mobile, web, and desktop apps after logout #12882

Closed
johndball opened this issue May 26, 2022 · 25 comments
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@johndball
Copy link

johndball commented May 26, 2022

Description

Steps to reproduce

This was originally posted in Element Web, but I later discovered this is an issue with Matrix Synapse server. Due to my original post being closed under Elemenet, I am moving this to Synapse.

  1. I logged out and am unable to log back in. On my mobile device, I am prompted with a "verify session" prompt, but the client never loads the content.
  2. This behavior has started recently after a user logs out of a client on a self-run homeserver. This issue is experienced on the Element Windows desktop app 1.10.13, app.element.io on the web, and Element mobile for iOS and Android.
  3. Web versions of Element show a sync job that never ends. I opened developer view on the web and I can see "Number of consecutive failed sync requests: 36" and other issues related to syncing.
  4. Clearing the cache on devices will log out and not log back in. In my attempts to regain access to my home server, I cleared cache on mobile and desktop which effectively lost access to the homeserver.
  5. Running Matrix Synapse 1.59.1, the user will log out and then is unable to log back in. A rollback to Matrix Synapse version 1.57.0 allows users to log back in and the sync error is gone.

Version information

  • Homeserver:
    We are running two custom homeservers - a production version and a federated/development homeserver.
  • Version:
    1.59.1 - buggy - users unable to log in after clearing cache or logging out
    1.58.1 - buggy, but some users can log in - the users that can log in have joined 2 or fewer rooms. The users that cannot log in have 3 or more rooms. My account is 15 rooms deep.
    1.57.0 - works fine. All users who previously could not log in can log in.
    1.56.0 - unable to downgrade to this version due to database constraints

  • Install method:

Installed via repo/package manager

  • Platform:
    Ubuntu 20.04 LTS
  • Others:
    One homeserver uses an apache proxy server for communications, the other has no proxy. Both homeservers sit behind Cloudflare. One is on-prem and the other is third-party cloud provider. Both homeservers are federated.
    I do not have verbose logging enabled, but can re-enable if needed.

  • Screenshot from Element Web, Element Desktop, and Element Mobile:
    170383096-e24a58fc-6610-4f8d-84fa-f78285277618
    170383092-748985de-8ce8-4101-a56b-59741126b785
    IMG_6618

@squahtx
Copy link
Contributor

squahtx commented May 26, 2022

Possibly the same as #12864 but we would very much like Synapse logs to confirm.
Are you able to provide INFO-level logs covering the time period of a failed login?

@squahtx squahtx added the X-Needs-Info This issue is blocked awaiting information from the reporter label May 26, 2022
@johndball
Copy link
Author

johndball commented May 26, 2022

Absolutely. Bear with me, but do I set the log level in homeserver.yaml and which logs (homeserver.log?) are you looking for?
If there is sensitive information in the log (specifically homeserver URL, etc.) can I DM?

Edit: Looks like I had database set to INFO, but root set to ERROR. I set root to INFO under log.yaml. Just advise on sensitive information and if homeserver.log is the log being requested. Thanks!

@squahtx
Copy link
Contributor

squahtx commented May 26, 2022

log_config in homeserver.yaml will point to the log config file. Inside the log config file, there'll be a line like:

root:
    level: INFO

which specifies the log level. It's INFO by default, so if you've not changed it, you probably don't need to do anything there.

It's homeserver.log we're after, unless you've got workers set up.

If you want to DM the logs, I'm reachable at @squah:matrix.org.

@johndball
Copy link
Author

johndball commented May 26, 2022

Didn't take long at all. Upgraded from 1.57.0 to 1.59.1 and it immediately went haywire once I cleared cache and reloaded.

Downgraded and the clients can sign in.

dpkg: warning: downgrading matrix-synapse-py3 from 1.59.1+focal1 to 1.57.0+focal1
(Reading database ... 169057 files and directories currently installed.)
Preparing to unpack matrix-synapse-py3_1.57.0+focal1_amd64.deb ...
Unpacking matrix-synapse-py3 (1.57.0+focal1) over (1.59.1+focal1) ...
Setting up matrix-synapse-py3 (1.57.0+focal1) ...

I will DM you a shared folder to download the logs from both Firefox Element web and homeserver.log in INFO mode.

@johndball
Copy link
Author

Hi @squahtx - just wanted to check in and ensure that the logs I uploaded were sufficient. Please let me know if additional details are required and I will be happy to support. The link for the log download will expire in 7 days. Thanks.

@squahtx
Copy link
Contributor

squahtx commented May 27, 2022

I've received them, thank you! I'm tied up with other things and haven't had the time to look at them yet.

In the mean time, could you try running the following SQL queries from @richvdh in #12864:

SELECT * FROM events e LEFT JOIN rejections r USING (event_id) WHERE event_id IN ('$KuawYhzU8G0qk7zUX2jEI_0PQH1yFnj-2CiiFF4K2UM', '$bEbwpl46iHvYccVAwpkLp6-qzxdeCCmiQTxbo1MObAQ');
SELECT event_id, internal_metadata FROM event_json WHERE event_id IN ('$KuawYhzU8G0qk7zUX2jEI_0PQH1yFnj-2CiiFF4K2UM', '$bEbwpl46iHvYccVAwpkLp6-qzxdeCCmiQTxbo1MObAQ'); 

@johndball
Copy link
Author

I attempted but continue to receive a query error due to unexpected syntax. Sorry, PostgreSQL is not my cup of tea.

@squahtx
Copy link
Contributor

squahtx commented May 27, 2022

Do you mind posting the query error you're seeing?

@johndball
Copy link
Author

Do you mind posting the query error you're seeing?

The query is not executing. It is most likely my lack of knowledge of PostgreSQL.

@MRAAGH
Copy link

MRAAGH commented May 28, 2022

Me and other users are experiencing this issue on both my homeservers (mazie.rocks running synapse 1.58.1 and aagrinder.xyz running synapse 1.59.1). Downgrading synapse 1.59.1 to 1.57.0 did NOT fix it. I tried different clients:

  • Element desktop 1.10.13: stuck on infinite sync; upon restarting it says "unable to connect to homeserver. retrying"
  • Element web 1.10.13 and 1.10.10: same as above
  • Fluffychat 1.4.0 web: "No connection to the server"
  • Fluffychat 1.4.0 on Android: able to log in normally
  • Moment (using matrix-nio): able to log in normally
  • Fractal on Ubuntu: able to log in normally
  • NeoChat on Ubuntu: able to log in normally

In all cases, clients that were already logged in, work perfectly fine. The issue only appears at login.

@squahtx
Copy link
Contributor

squahtx commented May 30, 2022

It's highly likely this is the same issue as #12864, which richvdh has a proposed fix for.

Are either of you willing to run a bleeding edge development branch of Synapse to test if it fixes the issue?:
#12905
https://github.com/matrix-org/synapse/tree/rav/fix_sync_404

@squahtx squahtx added A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels May 30, 2022
@MRAAGH
Copy link

MRAAGH commented May 30, 2022

Currently I'm running in docker; I suppose I could temporarily substitute that with a freshly compiled synapse instance, if I configure it correctly

@MRAAGH
Copy link

MRAAGH commented May 30, 2022

It's highly likely this is the same issue as #12864, which richvdh has a proposed fix for.

This sounds too specific to be the cause for my issue. For me, it happens on all accounts, even fresh accounts that have never logged in before and are not members of any rooms.

@squahtx
Copy link
Contributor

squahtx commented May 30, 2022

It's highly likely this is the same issue as #12864, which richvdh has a proposed fix for.

This sounds too specific to be the cause for my issue. For me, it happens on all accounts, even fresh accounts that have never logged in before and are not members of any rooms.

In that case, could you file a new issue and include synapse logs spanning the login and sync attempts of a fresh account?

@johndball
Copy link
Author

It's highly likely this is the same issue as #12864, which richvdh has a proposed fix for.

Are either of you willing to run a bleeding edge development branch of Synapse to test if it fixes the issue?: #12905 https://github.com/matrix-org/synapse/tree/rav/fix_sync_404

Yeah, I can do that as long as I can roll back using "dpkg -i" to a previous package. Do you know if I will need to roll back the database too?

@squahtx
Copy link
Contributor

squahtx commented May 31, 2022

Yeah, I can do that as long as I can roll back using "dpkg -i" to a previous package. Do you know if I will need to roll back the database too?

The database shouldn't need rolling back. I'd still recommend taking a backup before the upgrade, just in case.

@johndball
Copy link
Author

Yeah, I can do that as long as I can roll back using "dpkg -i" to a previous package. Do you know if I will need to roll back the database too?

The database shouldn't need rolling back. I'd still recommend taking a backup before the upgrade, just in case.

Sure. Let me know how to run the latest. My OS is Ubuntu 20.04 LTS and database backend is Postgres.

@johndball
Copy link
Author

johndball commented Jun 1, 2022

I've received them, thank you! I'm tied up with other things and haven't had the time to look at them yet.

In the mean time, could you try running the following SQL queries from @richvdh in #12864:

SELECT * FROM events e LEFT JOIN rejections r USING (event_id) WHERE event_id IN ('$KuawYhzU8G0qk7zUX2jEI_0PQH1yFnj-2CiiFF4K2UM', '$bEbwpl46iHvYccVAwpkLp6-qzxdeCCmiQTxbo1MObAQ');
SELECT event_id, internal_metadata FROM event_json WHERE event_id IN ('$KuawYhzU8G0qk7zUX2jEI_0PQH1yFnj-2CiiFF4K2UM', '$bEbwpl46iHvYccVAwpkLp6-qzxdeCCmiQTxbo1MObAQ'); 

Completed successfully. Wasn't sure of the next step after this so I upgraded to 1.60.0 and the issue still exists so I rolled back to 1.57.0.

@H-Shay
Copy link
Contributor

H-Shay commented Jun 3, 2022

If you want to wait for a packaged version with the proposed fix the release candidate should be out tuesday, which might be the easiest in this case.

@johndball
Copy link
Author

If you want to wait for a packaged version with the proposed fix the release candidate should be out tuesday, which might be the easiest in this case.

I'm cool with that. Whatever y'all think best. Thanks for working through this. :-)

@johndball
Copy link
Author

johndball commented Jun 15, 2022

Hi folks, any update on this? Has the 14 June 2022 release corrected the issue? I am willing to update and try, but I need to be sure that I can rollback to 1.57 if the issue still exists. Was hoping that somebody could confirm that the v1.61.0 update fixed the issue.

@MRAAGH
Copy link

MRAAGH commented Jun 15, 2022

I forgot to mention here: for me, the issue was fixed by changing public_baseurl (see #12920).

@johndball
Copy link
Author

johndball commented Jun 15, 2022

I forgot to mention here: for me, the issue was fixed by changing public_baseurl (see #12920).

Yeah, I followed your solution early on with hopes that it would fix my problem but I ended up taking down my entire server by changing what I had. I was hoping that it was a simple misconfiguration on my end. Perhaps it is... this is what I have and works still with 1.57.0.

public_baseurl: https://server.domain.com/

@richvdh
Copy link
Member

richvdh commented Jun 15, 2022

The "Could not find event" error suggests it is unrelated to public_baseurl.

I'm reasonably confident this will be fixed in 1.61.0 by #12905. Suggest you try it: 1.61.0 does not introduce any changes which stop you rolling back to 1.57.x.

@johndball
Copy link
Author

johndball commented Jun 15, 2022

The "Could not find event" error suggests it is unrelated to public_baseurl.

I'm reasonably confident this will be fixed in 1.61.0 by #12905. Suggest you try it: 1.61.0 does not introduce any changes which stop you rolling back to 1.57.x.

user@server:~# curl http://localhost:8008/_synapse/admin/v1/server_version
{"server_version":"1.61.0","python_version":"3.8.10"}user@server:~#

It worked! Logged in via mobile, web, and desktop. No issues. Sync was much faster compared to < 1.58.0 as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

5 participants