-
Notifications
You must be signed in to change notification settings - Fork 5k
Process.Unix: while reaping all processes, handle encountering direct children. #79817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… children. The process that runs as pid 1 is responsible for reaping orphaned processes. Since .NET 7, .NET applications running as pid 1 assume this responsibility. The code meant for reaping orphaned processes didn't account for encountering direct children. These child processes get reaped without updating the internal state. When the code later tries to reap such a child process it causes a FailFast because the process is missing.
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process Issue DetailsThe process that runs as pid 1 is responsible for reaping orphaned processes. Since .NET 7, .NET applications running as pid 1 assume this responsibility. The code meant for reaping orphaned processes didn't account for encountering direct children. These child processes get reaped without updating the internal state. When the code later tries to reap such a child process it causes a FailFast because the process is missing. Fixes #79540. @adamsit ptal
|
I've not been able to run the reproducer from the issue.
I'll build a .NET dll for .NET 7 that @JaroslavMajera can use to verify this fixes the issue. |
You can omit errors. They are quite "normal" from headless chromium. The reproducer tries to start chromium in cycle and just kill/stop chromium and again and again. Failure is pretty random. Sometimes it fails after 10 seconds and sometimes it takes minutes. The most important part from logs are start dispose to end dispose. |
Can you try using the |
I am already on holiday. I will try to test it as soon as possible. Is it ok? |
I think you meant @adamsitnik |
Sure, when you have time for it. |
I tried again, and it occurred several times within 1-2 minutes after starting the container. With the fix, the container ran continuously for 50 minutes (and then I terminated it manually). @dotnet/area-system-diagnostics-process this is up for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, big thanks for your help and the fix @tmds!
@jeffhandley @carlossanlop I would like to backport this fix to 7.0, but after waiting a couple of weeks (to test it). What is the deadline for next 7.0 patch release? |
We will have a servicing window January 10-16. |
@adamsitnik will you start a backport PR? |
/backport to release/7.0 |
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3884447954 |
The backport PR is ready for Tactics review (#80433, label applied, approved, email sent). |
The process that runs as pid 1 is responsible for reaping orphaned processes. Since .NET 7, .NET applications running as pid 1 assume this responsibility.
The code meant for reaping orphaned processes didn't account for encountering direct children. These child processes get reaped without updating the internal state. When the code later tries to reap such a child process it causes a FailFast because the process is missing.
Fixes #79540.
@adamsit ptal
cc @JaroslavMajera