Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: catch and report errors about tty I/O (#882) #1051

Merged
merged 2 commits into from
Mar 25, 2022

Conversation

raphCode
Copy link
Contributor

@raphCode raphCode commented Feb 10, 2022

I tried to fix #882 by logging the corresponding errors instead of crashing.
This is a quick fix of the symptoms (server crash), and does not clear the underlying issue (I suspect a race condition around the tty file descriptor handling).
I am not experienced enough in the codebase to resolve the race condition, but I may look into it if someone gives me directions.

More details are in the github issue and in the commit message (if someone git blames someday in the future).

Quick and dirty bandaid fix to some server crashes which occur to me lately.
The underlying issue seems to be a race condition somewhere when the shell in the pane
exits and the tty file descriptor becomes invalid, but zellij wants to write/read it?

Bug trigger:
- open some panes
- exit the shells in the panes by spamming Ctrl-D

works best when the system only runs on a single CPU, run the following to disable all
cores but one:
echo 0 | sudo tee /sys/devices/system/cpu/cpu*/online
@raphCode
Copy link
Contributor Author

Thanks for running the CI.
An e2e test failed, I don't know why and it really shouldn't because of this small logging change.
Since my fork was 39 commits behind, I merged the new changes in, hoping this may fix it.

Please approve the CI again.

@tlinford
Copy link
Contributor

I tried to fix #882 by logging the corresponding errors instead of crashing. This is a quick fix of the symptoms (server crash), and does not clear the underlying issue (I suspect a race condition around the tty file descriptor handling). I am not experienced enough in the codebase to resolve the race condition, but I may look into it if someone gives me directions.

More details are in the github issue and in the commit message (if someone git blames someday in the future).

Looked a bit more into this, and also do not know how to properly fix this at the moment, I suspect some pretty big changes would be necessary.
This fix seems very reasonable for now though 👍

@tlinford tlinford merged commit bda37c3 into zellij-org:main Mar 25, 2022
@raphCode
Copy link
Contributor Author

Yay, my first merged PR ❤️

I suspect some pretty big changes would be necessary.

Out of interest:
Do you think the race condition happens in the kernel / between zellij and other program?
Because safe Rust should not have any race condition bugs...

@tlinford
Copy link
Contributor

Out of interest:
Do you think the race condition happens in the kernel / between zellij and other program?
Because safe Rust should not have any race condition bugs...

My understanding is a bit limited of the full flow of opening ptys etc - but as far as I can tell the heart of the issue here is that we are passing around and using RawFd, which is effectively just a number, so rust can't really prevent us trying to write to a closed file with this api.

@raphCode
Copy link
Contributor Author

Yep, Raw* sounds pretty unsafe to me :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when exiting shell on single CPU systems
3 participants