Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal-desktop hangs after ca. 1 day of running and fails to send or receive messages #6577

Open
iridos opened this issue Aug 16, 2023 · 33 comments

Comments

@iridos
Copy link

iridos commented Aug 16, 2023

  • [ x ] I have searched open and closed issues for duplicates
  • [ x ] I am using Signal-Desktop as provided by the Signal team, not a 3rd-party package.

Bug Description

signal-desktop hangs after a day or so, usually the next day after leaving it over night.

Steps to Reproduce

System: Debian/openbox/xfce4-panel

  1. run signal-desktop
  2. wait 1-2 days
  3. try to send or receive messages
  4. when a message is sent from signal-desktop, the device it is linked to doesn't recieve the message either

Actual Result:

After that time, no more messages are received and trying to send messages stops at an incomplete first circle with a dashed border - see screenshot.

Expected Result:

sends/receives messages

Screenshots

Screenshot:
signal

Platform Info

Signal-desktop Version:

6.26.0
production

ii signal-desktop 6.26.0 amd64

Operating System: Debian Linux 12.0 and 11.0 (I have seen the same behavior on two different systems with different Debian versions)

Linked Device Version:

6.28.6

Link to Debug Log

https://debuglogs.org/desktop/6.26.0/d4e9a9f3bb7f663cf4d024deb731126fe367bb4cbffe5a3999efef79bc35bba3.gz

Device debug log:

https://debuglogs.org/android/6.28.6/5bae01c3493ddf84e176887d1e285c693ca6271877289f0535338cf38dbb4af4

@iridos
Copy link
Author

iridos commented Aug 17, 2023

I restarted yesterday… messages written out on starting terminal since yesterday (excerpt):

{"level":30,"time":"2023-08-16T13:34:24.032Z","msg":"System tray service: setting unread count to 0"}
{"level":30,"time":"2023-08-16T13:34:24.032Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T13:36:41.182Z","msg":"System tray service: setting unread count to 1"}
{"level":30,"time":"2023-08-16T13:36:41.182Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T13:36:45.935Z","msg":"System tray service: setting unread count to 0"}
{"level":30,"time":"2023-08-16T13:36:45.935Z","msg":"System tray service: rendering no tray"}
{"level":30,"time":"2023-08-16T15:20:03.545Z","msg":"Updating BrowserWindow config: %s {\"maximized\":false,\"autoHideMenuBar\":false,\"fullscreen\":false,\"width\":1014,\"height\":696,\"x\":2579,\"y\":47}"}
{"level":30,"time":"2023-08-16T15:20:03.545Z","msg":"config/set: Saving ephemeral config to disk"}
{"level":30,"time":"2023-08-16T15:20:03.546Z","msg":"config/set: Saved ephemeral config to disk"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}
{"level":50,"time":"2023-08-17T07:53:28.485Z","msg":"Error occurred in handler for 'net.resolveHost': {}"}

@iridos
Copy link
Author

iridos commented Aug 30, 2023

any ideas? even for a workaround? right now I have to restart daily and have no idea what is happening

@knarrff
Copy link

knarrff commented Sep 5, 2023

I now see the same. Also on a Debian (bookworm) system, with signal version 6.29.1 (which I believe is the latest Debian package available).

@trevor-signal
Copy link
Contributor

@knarrff can you provide a debug log?

@knarrff
Copy link

knarrff commented Sep 6, 2023

@knarrff can you provide a debug log?

Next time it happens. I'll likely have to wait for a day or so.

@knarrff
Copy link

knarrff commented Sep 8, 2023

https://debuglogs.org/desktop/6.29.1/7818872c4c13ea491b91428ec7c26f327679df0b49bba7f574624fb8c8120c12.gz

You probably know better what to look for. Just a few things that might help:

  • I did suspend/resume a few times. That may be of importance for errors like
    • TaskWithTimeout: SQL channel call (getNextAttachmentDownloadJobs) has been running for 1800005ms
    • Error: SQL channel call (getNextAttachmentDownloadJobs) task did not complete in time
    • -> if this does not measure uptime, but total time, that timeout measure is bound to trip on suspend/resume-cycles.
  • I see INFO 2023-09-07T10:59:22.597Z NotificationService: disabling, which may have been the time the app did not get notifications anymore, but also did not send anything anymore. A few lines before I read
WARN  2023-09-07T05:28:51.619Z WebSocketResource(authenticated): Socket closed
INFO  2023-09-07T05:29:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z 
INFO  2023-09-07T05:30:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:31:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:32:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:33:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:33:23.948Z RetryPlaceholders.getExpiredAndRemove: Found 0 expired items
INFO  2023-09-07T05:33:23.956Z retryPlaceholders/interval: Found 0 expired items
INFO  2023-09-07T05:34:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:35:23.941Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z
INFO  2023-09-07T05:36:23.941Z routineProfileRefresh/2: starting
INFO  2023-09-07T05:36:23.942Z routineProfileRefresh/2: updating last refresh time
INFO  2023-09-07T05:36:23.942Z UpdateKeysListener: Next update scheduled for 2023-09-07T20:43:25.525Z 
INFO  2023-09-07T05:36:23.960Z routineProfileRefresh/2: starting to refresh conversations
INFO  2023-09-07T05:36:23.962Z routineProfileRefresh/2: refreshing profile for [REDACTED]36c ([REDACTED]022)
INFO  2023-09-07T05:36:23.962Z getProfile: getting unversioned profile for conversation [REDACTED]36c ([REDACTED]022)
INFO  2023-09-07T05:36:23.962Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
INFO  2023-09-07T05:36:23.962Z Cycling agent for type undefined-auth
INFO  2023-09-07T05:36:23.962Z routineProfileRefresh/2: refreshing profile for [REDACTED]651 ([REDACTED]64a)
INFO  2023-09-07T05:36:23.963Z getProfile: getting unversioned profile for conversation [REDACTED]651 ([REDACTED]64a)
INFO  2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]111 ([REDACTED]f2a)
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]71d ([REDACTED]705)
INFO  2023-09-07T05:36:23.963Z getProfile: getting unversioned profile for conversation [REDACTED]71d ([REDACTED]705)
INFO  2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]0e2 ([REDACTED]59e)
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 0 Error
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshed profile for [REDACTED]111 ([REDACTED]f2a)
ERROR 2023-09-07T05:36:23.963Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d 0 Error
INFO  2023-09-07T05:36:23.963Z routineProfileRefresh/2: refreshing profile for [REDACTED]eb2 ([REDACTED]d15)
INFO  2023-09-07T05:36:25.941Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
ERROR 2023-09-07T05:36:25.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
INFO  2023-09-07T05:36:25.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651
ERROR 2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]651 0 Error
INFO  2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d
ERROR 2023-09-07T05:36:25.943Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]71d 0 Error
INFO  2023-09-07T05:36:27.941Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c
ERROR 2023-09-07T05:36:27.942Z GET (WS) https://chat.signal.org/v1/profile/[REDACTED]36c 0 Error
WARN  2023-09-07T05:36:27.942Z getProfile failure: [REDACTED]36c ([REDACTED]022) code: -1

After that network connections seem to be recovering, but notifications stay off.

@scottnonnenberg-signal
Copy link
Contributor

@knarrff Your log also has these net.resolveHost errors. There's more specific data in some of those log lines - can you connect those ERR_NETWORK_CHANGED events to anything that happened on your computer?

ERROR 2023-09-04T04:15:36.059Z Error occurred in handler for 'net.resolveHost': {}
ERROR 2023-09-04T04:15:36.059Z Error occurred in handler for 'net.resolveHost': {}
ERROR 2023-09-06T13:57:03.540Z Error occurred in handler for 'net.resolveHost': {}
WARN  2023-09-06T13:57:03.559Z SocketManager: authenticated socket connection failed with error: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-06T13:57:03.619Z Top-level unhandled promise rejection: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-06T13:57:03.619Z Top-level unhandled promise rejection: HTTPError: connectResource: connectFailed; code: -1
  Caused by: Error: Error invoking remote method 'net.resolveHost': Error: net::ERR_NETWORK_CHANGED
ERROR 2023-09-08T08:01:56.958Z Error occurred in handler for 'net.resolveHost': {}

@iridos
Copy link
Author

iridos commented Sep 14, 2023

Hmm, I do have the Error occurred in handler for 'net.resolveHost': {} messages, but nothing around them that seems connected.

I had tried killing the network thread of signal-desktop alone, which caused it to be restarted, but that did not recover full functionality.

@scottnonnenberg-signal
Copy link
Contributor

@iridos @knarrff What can you tell us about your network setup?

@iridos
Copy link
Author

iridos commented Sep 18, 2023

This machine has a wired connection set up via network manager connected to a university network. I don't do suspend/resumes here.

$ nmcli
enp0s25: connected to Wired connection 1
        "Intel 82579LM"
        ethernet (e1000e), 00:19:99:EB:37:0A, hw, mtu 1500
        ip4 default, ip6 default
        inet4 134.60.2.xxx/24
        route4 134.60.2.0/24 metric 100
        route4 default via 134.60.2.1 metric 100
        inet6 2001:7c0:3101:a04:ea77:cac2:c4c8:xxxx/64
        inet6 2001:7c0:3101:a04:6003:b63d:fc2d:xxxx/64
        inet6 2001:7c0:3101:a04:e94f:fc35:ff09:xxxx/64
        inet6 2001:7c0:3101:a04:dcf6:19af:aa1b:xxxx/64
        inet6 2001:7c0:3101:a04:b8a6:c4a3:e5c1:xxxx/64
        inet6 2001:7c0:3101:a04:b809:8fbe:164:xxxx/64
        inet6 2001:7c0:3101:a04:e1aa:8da8:c313:xxxx/64
        inet6 2001:7c0:3101:a04:219:99ff:feeb:xxxx/64
        inet6 fe80::219:99ff:feeb:xxxx/64
        route6 2001:7c0:3101:a04::/64 metric 100
        route6 fe80::/64 metric 1024
        route6 default via fe80::1 metric 100

lo: connected (externally) to lo
        "lo"
        loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
        inet4 127.0.0.1/8
        inet6 ::1/128
        route6 ::1/128 metric 256

DNS configuration:
        servers: 134.60.1.111
        interface: enp0s25

        servers: 2001:7c0:3100::111 2001:7c0:3100:1::111
        interface: enp0s25

Edit: I have disabled ipv6 for now and restarted after the next crash

@knarrff
Copy link

knarrff commented Sep 19, 2023

@iridos @knarrff What can you tell us about your network setup?

Setup is as flexible as a typical laptop: partially wired ethernet and partially wifi. In case it is important: some of the networks it is connected to do support IPv6, some do not. Sometimes I intentionally disable ipv6 using /proc/sys/net/ipv6/conf/all/disable_ipv6. There is multiple wired ethernet and multiple wifi networks it regularly connects to. How the laptop is connected can change right after resume, but also right in the middle of normal operation.

network-manager deals with that, version 1.42.4-1 currently, on a Debian stable machine.

@iridos
Copy link
Author

iridos commented Sep 19, 2023 via email

@iridos
Copy link
Author

iridos commented Sep 22, 2023

Hmm. Actually, I just noticed something after unlocking the screensaver - whatsapp web and signal-desktop both showed a "reconnecting to network" message. But I don't understand why that is. Network definitely isn't available only during the times when my desktop isn't locked by a screensaver - log in via ssh from home in the evening or morning all the time with no issue.

I think the issue wasn't triggered at that time with the visual "reconnecting" status but the next time after I had locked the screen. But there were no "could not resolve host" messages this time.

Now I wonder: why did signal-desktop have to reconnect at the time. The screensaver forcibly grabs all input - could this block something?

The screensaver used here is light-locker (default for the installed DE/WM). Maybe it depends on the screen-saver used. I just killed light-locker and started xscreensaver. We will see within the next week if that makes the problem disappear.

Best,
Karsten

@knarrff
Copy link

knarrff commented Sep 22, 2023

Maybe just a red herring, but I also use light-locker (also default here).

@iridos
Copy link
Author

iridos commented Sep 25, 2023

That seems to have made a difference. signal-desktop can still send&receive messages after the weekend, which is a longer time than what I can remember seeing without a hang.

There is now another problem that most emotes can't be shown after the weekend (I think all emotes that I hadnt't recently used), but no idea if that h is a different manifestation of the same thing.

@scottnonnenberg-signal
Copy link
Contributor

@iridos Emoji not loading often means that the app was updated out from under you, so on-disk references no longer work. Also causes crashes if you try to click a link. Do you have automatic apt updates configured?

@iridos
Copy link
Author

iridos commented Sep 26, 2023

Hi,

another day without it hanging. I think removing light-locker was a change that made the problem disappear for me. Maybe you can try to repoduce it yourself now by using light-locker?

light-locker has an option --idle-hint/--no-idle-hint. I guess it sets that while it is active by default. xscreensaver does not seem to do that (at least the man-page does not mention it), so that is a possible pointer.
That seems to happen here https://github.com/the-cavalry/light-locker/blob/7587b53954a4d1c41a76d178d5e11ebb59eba922/src/gs-listener-dbus.c#L355
and be a call to dbus_message_new_method_call.

@scottnonnenberg-signal I do. And it restarts services as needed, but of course, signal-desktop is not a service. Good explanation. But I think that is not a possible cause for the message-hangs I had experienced before but is unrelated, as the hangs could happen several times a day and automatic updates don't happen with that frequency.

@iridos
Copy link
Author

iridos commented Oct 5, 2023

So, I have been away for the last couple of days - back and signal-desktop was still running without problems. I switched back to light-locker yesterday - and had to enable dpms and screen blanking via xorg with something like "xset s 300".

So after doing this yesterday, today signal-desktop hangs again like it did before.

Without screen blanking by xorg, light-locker wouldn't lock, except via command or key-stroke. I locked like that 2-3 times and signal-desktop kept running (but those few times are not enough for proof)

Have you already tried reproducing the error?

@iridos
Copy link
Author

iridos commented Oct 19, 2023

Any news? Could you reproduce by now?

@iridos
Copy link
Author

iridos commented Nov 8, 2023

I think the trigger is described quite closely now. Do you need further help to track the problem?

@mzguy
Copy link

mzguy commented Nov 10, 2023

I'm having this issue. I'm not sure what the trigger is, since I'm not using light-locker.

I also don't suspend. Simply use Ubuntu stock lock screen.

@indutny-signal
Copy link
Contributor

@mzguy could you submit and quote the debug log here when the issue happens again, please?

@iridos
Copy link
Author

iridos commented Dec 8, 2023

Hi,
is this fixed? What caused the hangs?
Cheers,
I.

@mzguy
Copy link

mzguy commented Dec 28, 2023

It's not fixed. I just submitted an issue to Signal with a debug log.

@iridos
Copy link
Author

iridos commented Dec 29, 2023

@mzguy I don't think it's light-locker per se… locking and grabbing kbd is what xscreensaver also does.
light-locker doesn't blank itself, but leaves that to X. It may also tell dbus or something that the screen is locked and interactive processes can go to sleep. And this is what I suspect happens with signal. Some bits of it are getting paused and then some stuff gets out of sync and it can't recover from that.

Also… I think it's just a reflex to ask for the debug information. I looked at the debug information and I don't see a clue in it as to what's happening. Also several people already submitted debug information how is one or several more going to help

@iridos
Copy link
Author

iridos commented May 7, 2024

Any news on this?
@indutny-signal - care to comment what has been completed?

@mzguy
Copy link

mzguy commented May 7, 2024

I was also going to check on this today and got the notification of a new post!

I have to kill the Signal processes daily and restart them. If I don't notice, I type and try to send a message which often gets lost if I don't notice it's not going out.

@indutny-signal indutny-signal reopened this May 9, 2024
@indutny-signal
Copy link
Contributor

Sorry about this. I know it might seem like a reflex, but it is hard to be sure we know what we are looking at without debug log. Could you still submit one right after reproducing the issue? Thank you!

@iridos
Copy link
Author

iridos commented May 22, 2024

So as an update from me: after switching away from light-locker, I have not seen any hangs over months now.

After switching back to light-locker, I am still seeing the problem after having run signal-desktop in the background for 2 days.

Has anyone tried reproducing this using light-locker?

Light-locker does seem to not do so much itself, but let the X11 server do the blanking/powersaving. I could have done some more tests to narrow the actual cause down, but … well… some suggestions would have been nice and also to know this goes towards a fix.

And yeah, it seems like a reflex and I do wonder what having YADD (yet another debug dump) is going to tell you that the previous debug dumps have not told you. Sure, it might be a different problem, but as the one I reported nearly a year ago remains unfixed, that point seems pretty moot.

Here's some more debug info, now from signal-desktop 7.9.0 (messages on connecting to pid omitted):

$ ps a -o pid,cmd | grep signa[l]-d | cut -b 1-100
 815831 /opt/Signal/signal-desktop
 815835 /opt/Signal/signal-desktop --type=zygote --no-zygote-sandbox
 815836 /opt/Signal/signal-desktop --type=zygote
 815838 /opt/Signal/signal-desktop --type=zygote
 815872 /opt/Signal/signal-desktop --type=gpu-process --enable-crash-reporter=6ce535c1-ac2e-4e0e-b20
 815878 /opt/Signal/signal-desktop --type=utility --utility-sub-type=network.mojom.NetworkService --
 815917 /opt/Signal/signal-desktop --type=renderer --enable-crash-reporter=6ce535c1-ac2e-4e0e-b205-5

gdb -p 815831
GNU gdb (Debian 13.1-3) 13.1

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f4bfddaf15f in __GI___poll (fds=0xfcc02f5f900, nfds=5, timeout=1194) at ../sysdeps/unix/sysv/linux/poll.c:29
29	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007f4bfddaf15f in __GI___poll (fds=0xfcc02f5f900, nfds=5, timeout=1194) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f4bff11c9ae in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f4bff11cacc in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x0000559e033df465 in  ()
#4  0x00000356601a234f in  ()
#5  0x0000000000001f40 in  ()
#6  0x000003566007ec8f in  ()
#7  0xaaaaaaaaaaaaaa00 in  ()
#8  0x00003d4400304350 in  ()
#9  0x00000000aaaaaa00 in  ()
#10 0xaaaaaa0100000000 in  ()
#11 0x0000000000000000 in  ()

$ gdb -p 815835
(gdb) bt
#0  0x00007f2c00d6e1f8 in __ppoll (fds=0x7ffd362efda8, nfds=1, timeout=<optimized out>, sigmask=0x7ffd362efc50) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x000055b1a01512f8 in  ()
#2  0x00007ffd362efb40 in  ()
#3  0x000055b19cf121a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()

$ gdb -p 815836
(gdb) bt
#0  0x00007f8a13869bc6 in __waitid (idtype=P_ALL, id=0, infop=0x7ffddbbb63a0, options=4) at ../sysdeps/unix/sysv/linux/waitid.c:29
#1  0x000055d8b8817267 in  ()
#2  0xaaaaaaaaaaaaaaaa in  ()
#3  0x000055d8b37491a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()

$ gdb -p 815838
(gdb) bt
#0  0x00007f8a138921f8 in __ppoll (fds=0x7ffddbbb6568, nfds=1, timeout=<optimized out>, sigmask=0x7ffddbbb6410) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x000055d8b69882f8 in  ()
#2  0xaaaaaaaaaaaaaaaa in  ()
#3  0x000055d8b37491a0 in uv_tty_set_vterm_state ()
#4  0x0000000000000000 in  ()
$ gdb -p 815872
(gdb) bt
#0  0x00007f2c00d6e15f in __GI___poll (fds=0x2ba4001197a0, nfds=3, timeout=4000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f2c0211c9ae in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f2c0211cacc in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x000055b1a0864465 in  ()
#4  0x000003567427d145 in  ()
#5  0x0000000000001f40 in  ()
#6  0x0000035673eac862 in  ()
#7  0xaaaaaaaaaaaaaa00 in  ()
#8  0x00002ba4000340d0 in  ()
#9  0x00000000aaaaaa00 in  ()
#10 0xaaaaaa0100000000 in  ()
#11 0x0000000000000000 in  ()
$ gdb -p 815917
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7fff3fd71380, op=137, expected=0, futex_word=0x7fff3fd714a0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common
    (futex_word=futex_word@entry=0x7fff3fd714a0, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fff3fd71380, private=private@entry=0, cancel=cancel@entry=true)
    at ./nptl/futex-internal.c:87
#2  0x00007f56d3884efb in __GI___futex_abstimed_wait_cancelable64
    (futex_word=futex_word@entry=0x7fff3fd714a0, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7fff3fd71380, private=private@entry=0)
    at ./nptl/futex-internal.c:139
#3  0x00007f56d388783c in __pthread_cond_wait_common (abstime=0x7fff3fd71380, clockid=1, mutex=0x7fff3fd71450, cond=0x7fff3fd71478) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_timedwait64 (cond=0x7fff3fd71478, mutex=0x7fff3fd71450, abstime=0x7fff3fd71380) at ./nptl/pthread_cond_wait.c:643
#5  0x000055e7b3620398 in  ()
#6  0x000000000037ff86 in  ()
#7  0x0000000028a39834 in  ()
#8  0xaaaaaaaaaaaaaa00 in  ()
#9  0xaaaaaaaaaaaaaaaa in  ()
#10 0xaaaaaaaaaaaaaa00 in  ()
#11 0xaaaaaaaaaaaaaaaa in  ()
#12 0xaaaaaaaaaaaaaaaa in  ()
#13 0xaaaaaaaaaaaaaaaa in  ()
#14 0xaaaaaaaaaaaaaaaa in  ()
#15 0xaaaaaaaaaaaaaaaa in  ()
#16 0xaaaaaaaaaaaaaaaa in  ()
#17 0xaaaaaaaaaaaaaaaa in  ()
#18 0xaaaaaaaaaaaaaa00 in  ()
#19 0x00000ea40040c1d0 in  ()
#20 0x000000000037ff86 in  ()
#21 0x0000000017cbfac4 in  ()
#22 0x0000000000000000 in  ()

@dirwiz
Copy link

dirwiz commented Jun 24, 2024

So as an update from me: after switching away from light-locker, I have not seen any hangs over months now.

After switching back to light-locker, I am still seeing the problem after having run signal-desktop in the background for 2 days.

@iridos Thanks for the tip. I removed light-locker from both a Debian & Mint distributions running XFCE & Lightdm.
This definitely solved the problem for me. Hopefully the @indutny-signal will find this helpful in reproducing the problem.

@mzguy
Copy link

mzguy commented Jul 16, 2024

Sorry about this. I know it might seem like a reflex, but it is hard to be sure we know what we are looking at without debug log. Could you still submit one right after reproducing the issue? Thank you!

@indutny-signal I just noticed this. I did submit a debug log when asked. I'm running Ubuntu 20.04 LTE, very popular and vanilla setup. I haven't installed any other screensavers or anything like that.

Did you see my debug log? Can you reproduce or find the root cause of this issue yet?

@jsn-0
Copy link

jsn-0 commented Aug 9, 2024

I've been dealing with this issue with Debian 12 / Xfce / light-locker. Locking a session causes browser websocket connections to drop as well as any sort of electron based apps that use websockets. Basically any X11 apps that have a persistent network connection lose connectivity. SSH connections and cli utilities started from a session are unaffected.

Firefox and Chromium recover fine. Other apps vary. Signal fails to re-establish it's connection (or out right becomes unresponsive and has to be killed). My current logs show Signal failing to re-establish a websocket since unlocking my computer.

{"level":30,"time":"2024-08-08T23:53:14.406Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:53:36.234Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:53:44.482Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:06.322Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:14.484Z","msg":"WebSocketResource(unauthenticated).close(3001)"}
{"level":40,"time":"2024-08-08T23:54:14.490Z","msg":"WebSocketResource(unauthenticated): Socket closed"}
{"level":40,"time":"2024-08-08T23:54:14.490Z","msg":"SocketManager: unauthenticated socket closed with code=3001 and reason=No response to keepalive request"}
{"level":30,"time":"2024-08-08T23:54:36.405Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:54:44.497Z","msg":"WebSocketResources.KeepAlive(unauthenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:55:06.494Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}
{"level":30,"time":"2024-08-08T23:55:14.498Z","msg":"WebSocketResource(unauthenticated).close: Already closed! 3001/No response to keepalive request"}
{"level":30,"time":"2024-08-08T23:55:36.590Z","msg":"WebSocketResources.KeepAlive(authenticated).send: Sending a keepalive message"}

@iridos
Copy link
Author

iridos commented Dec 30, 2024

Just a reminder… I installed a new laptop with default debian and one and a half years later this still isn't fixed.

I think I acutually provided some very good pointers … it feels like I provided you with all the debug logs in the world including gdb backtraces. But you know… there's only a lmited amount of things one can do without knowing the code base and limited self-taught programming skills

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants