Skip to content

All node processes hang in uninterruptible deep sleep on linux #55587

Closed as not planned
@argmaxmax

Description

Version

v20.17.0

Platform

Linux 6.6.58 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:46:36 UTC 2024 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Serving or installing any previously working node project results in the node process hanging uninterruptably.

The SIGKILL signal is not handled, it seems the syscall handler of syscall 281 (epoll_pwait) goes to deep sleep, and as a kernel thread, suspends signal handling until completion.

However, there is also other syscalls still happening.

Furthermore, it seems to do some more work. Here is the output of running:

NPM_DEBUG=true strace -f -o strace.log npm install
---SNIPPED---
38667 madvise(0x1c52b7e00000, 262144, MADV_DONTNEED) = 0
38667 futex(0x28962cb4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
38657 <... epoll_pwait resumed>[], 1024, 78, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\213", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\231", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\271", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\270", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\274", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\264", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\246", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\247", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\207", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\217", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\213", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4

How often does it reproduce? Is there a required condition?

Unconditionally reproducible in several node projects. Even simple vite official templates.

What is the expected behavior? Why is that the expected behavior?

The process completes the installation of the node dependencies in a resonably time, all the while dipslaying status information.

If the process is desired to be killed, sending the SIGTERM or SIGKILL signal should kill the process in reasonable time.

What do you see instead?

Instead, the process hangs indefinitely.

Furthermore, killing the process is impossible.

Additional information

No response

Metadata

Assignees

No one assigned

    Labels

    linuxIssues and PRs related to the Linux platform.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions