Thread 'stdin_handler' panicked along suspending system into sleep mode #1429

matu3ba · 2022-05-22T11:33:27Z

Unfortunately the issue is not easily reproducable and I dont have a backtrace/the panic handler did not tell what code broke.
Further, I broke the logs in /tmp/zellij-1000/zellij-log/ by starting a fresh instance, but I expect those would not tell much either.

The following output information were printed:

  × Thread 'stdin_handler' panicked.
  ├─▶ At /home/misterspoon/.cargo/registry/src/github.com-1ecc6299db9ec823/zellij-client-0.29.1/src/os_input_output.rs:256:14
  ╰─▶ could not poll stdin for readiness: Os { code: 4, kind: Interrupted, message: "Interrupted system call" }
  help: If you are seeing this message, it means that something went wrong.
        Please report this error to the github issue.
        (https://github.com/zellij-org/zellij/issues)

        Also, if you want to see the backtrace, you can set the `RUST_BACKTRACE` environment variable to `1`.

Basic information

zellij --version: zellij 0.29.1
tput lines: 30 (not relevant)
tput cols: 126 (not relevant)
uname -av or ver(Windows): Linux pc 5.17.9-arch1-1

List of programs you interact with as, PROGRAM --version: output cropped meaningful, for example:
nvim --version: NVIM v0.5.0-dev+1299-g1c2e504d5 (used the appimage release)
alacritty --version: alacritty 0.7.2 (5ac8060b)

Further information
To reproduce, I suspect that suspending system into sleep mode and waking up should reproduce the issue eventually.
Signaling code is very hard, so I would not be surprised that some dependency does not handle the signal interrupt mask or handlers correctly.

If you give me some pointers what you use to poll stdin I'll fix the signaling code myself. Ideally a portable solution should be used.
My suspection is that signal interrupts are not handled at all such that the bubbled up failure to poll crashes the application.

The text was updated successfully, but these errors were encountered:

tlinford · 2022-05-23T15:13:31Z

i think it could be related to #1334, which might explain why you are having trouble reproducing it consistently.

raphCode · 2022-05-24T07:42:13Z

Does this crash your session / the zellij server or are you able to reattach to your session afterwards?

raphCode · 2022-05-24T08:53:15Z

We actually have a dedicated signal handling thread here:

zellij/zellij-client/src/lib.rs

Line 239 in a7ddfe1

let _signal_thread = thread::Builder::new()

zellij/zellij-client/src/os_input_output.rs

Line 155 in a7ddfe1

fn handle_signals(&self, sigwinch_cb: Box<dyn Fn()>, quit_cb: Box<dyn Fn()>) {

So I am not sure why any signal still interrupts the polling syscall (probably poll or select returning EINTR) here:

zellij/zellij-client/src/os_input_output.rs

Lines 254 to 256 in a7ddfe1

    
           self.poll 
        
               .poll(&mut self.events, Some(self.timeout)) 
        
               .expect("could not poll stdin for readiness");

matu3ba · 2022-05-24T09:42:32Z

Thanks for the pointers!

Is EINTR already handled in Rust by retrying or are C semantics used?
SIGSTOP and SIGCONT signals are not handled, but I need to check first what the signaling semantics of mio poll are.
.expect("could not poll stdin for readiness"); reads like it doesnt handle this edge case (but I must check first)

raphCode · 2022-05-24T10:24:10Z

Disclaimer: I do not know this area of the code well, take the following with a grain of salt:

Judging from my Rust knowlegde, poll() very likely returns a Result which indicates success or failure of the call and contains the return value (on success) or error information (on failure).
unwrap() called on a Result returns the wrapped success value, or panics if the Result indicates failure. Since the latter happened, I assume that poll() returned a failure. To answer question 1, I think nothing is retried automatically, but instead the error is bubbled up.

The best way to fix this would be in my opinion to make sure the poll() doesn't get interrupted / all signals are delivered only to the signal thread (do signals work this way?).
Second best option is probably to just retry the poll if we receive that error.

matu3ba · 2022-05-24T10:37:06Z

do signals work this way?

Signal semantics on the OS work in the following way: There is a process mask and thread mask of "always guaranteed to be delivered signals" and realtime signals (user-definable), which are defined as list.
Signal handlers are registered process-wide, while signals can be send to each individual thread. Each individual thread has a signalmask it (or other threads) can can modify (sigprocmask). Except for pid 1, SIGKILL can not be ignored and leads to ~~thread~~ process teardown after Kernel tasks succeded (no custom handler possible).
SIGABRT can neither be ignored (unless pid 1), but the application can install a custom handler with LONGJMP or do nothing to not run execute the process teardown.
man signal-safety and man 7 signal are a good read.

Overall, signal handling is a pile of non-portable mess as signals are kinda like small interrupt handlers. So its best to mask everything and only allow necessary things in specified regions of code.

See also my PR on the abort() call in Zig ziglang/zig#11565 or musl abort().
One of the worst nightmares is that its not UB to call the signal handler start code from the signal handler (leading to recursive, potentially infinite, loops) and the horrible recursive locking and unlocking code in glibc.

matu3ba added the suspected bug label May 22, 2022

a-kenji added the stability Issues in relation to stability label May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread 'stdin_handler' panicked along suspending system into sleep mode #1429

Thread 'stdin_handler' panicked along suspending system into sleep mode #1429

matu3ba commented May 22, 2022

tlinford commented May 23, 2022

raphCode commented May 24, 2022

raphCode commented May 24, 2022

matu3ba commented May 24, 2022

raphCode commented May 24, 2022

matu3ba commented May 24, 2022 •

edited

Loading

Thread 'stdin_handler' panicked along suspending system into sleep mode #1429

Thread 'stdin_handler' panicked along suspending system into sleep mode #1429

Comments

matu3ba commented May 22, 2022

tlinford commented May 23, 2022

raphCode commented May 24, 2022

raphCode commented May 24, 2022

matu3ba commented May 24, 2022

raphCode commented May 24, 2022

matu3ba commented May 24, 2022 • edited Loading

matu3ba commented May 24, 2022 •

edited

Loading