Skip to content

Process.Unix: while reaping all processes, handle encountering direct children. #79817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 23, 2022

Conversation

tmds
Copy link
Member

@tmds tmds commented Dec 19, 2022

The process that runs as pid 1 is responsible for reaping orphaned processes. Since .NET 7, .NET applications running as pid 1 assume this responsibility.

The code meant for reaping orphaned processes didn't account for encountering direct children. These child processes get reaped without updating the internal state. When the code later tries to reap such a child process it causes a FailFast because the process is missing.

Fixes #79540.

@adamsit ptal
cc @JaroslavMajera

… children.

The process that runs as pid 1 is responsible for reaping orphaned processes.
Since .NET 7, .NET applications running as pid 1 assume this responsibility.

The code meant for reaping orphaned processes didn't account for encountering
direct children. These child processes get reaped without updating
the internal state. When the code later tries to reap such a child process
it causes a FailFast because the process is missing.
@ghost ghost added area-System.Diagnostics.Process community-contribution Indicates that the PR has been added by a community member labels Dec 19, 2022
@ghost
Copy link

ghost commented Dec 19, 2022

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Issue Details

The process that runs as pid 1 is responsible for reaping orphaned processes. Since .NET 7, .NET applications running as pid 1 assume this responsibility.

The code meant for reaping orphaned processes didn't account for encountering direct children. These child processes get reaped without updating the internal state. When the code later tries to reap such a child process it causes a FailFast because the process is missing.

Fixes #79540.

@adamsit ptal
cc @JaroslavMajera

Author: tmds
Assignees: -
Labels:

area-System.Diagnostics.Process

Milestone: -

@tmds
Copy link
Member Author

tmds commented Dec 19, 2022

I've not been able to run the reproducer from the issue.
The chromium stuff that gets started doesn't seem to run for me:

Error find: '/root/.config/chromium/Crash Reports/pending/': No such file or directory
Error [1219/142740.712623:INFO:cpu_info.cc(53)] Available number of cores: 12
Error [1219/142740.712623:INFO:cpu_info.cc(53)] Available number of cores: 12
Error [1219/142740.712913:VERBOSE1:zygote_main_linux.cc(218)] ZygoteMain: initializing 0 fork delegates
Error [1219/142740.712910:VERBOSE1:zygote_main_linux.cc(218)] ZygoteMain: initializing 0 fork delegates
Error [1219/142740.713614:WARNING:discardable_shared_memory_manager.cc(197)] Less than 64MB of free space in temporary directory for shared memory files: 62
Error [1219/142740.735158:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
Error [1219/142740.735362:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
Error [1219/142740.735954:VERBOSE1:webrtc_internals.cc(120)] Could not get the download directory.
Error [1219/142740.736077:VERBOSE1:media_stream_manager.cc(1100)] MSM::InitializeMaybeAsync([this=0x55dbee546720])
Error [1219/142740.736112:VERBOSE1:media_stream_manager.cc(1100)] MDM::MediaDevicesManager()
Error [1219/142740.736135:VERBOSE1:media_stream_manager.cc(1100)] MSM::MediaStreamManager([this=0x55dbee546720]))
Error 
Error DevTools listening on ws://127.0.0.1:20000/devtools/browser/9172f1dc-2322-471b-8f0a-e2edba26e48d
Error [1219/142740.736742:VERBOSE1:key_storage_util_linux.cc(54)] Password storage detected desktop environment: (unknown)
Error [1219/142740.736768:VERBOSE1:key_storage_linux.cc(122)] Selected backend for OSCrypt: BASIC_TEXT
Error [1219/142740.736780:VERBOSE1:key_storage_linux.cc(142)] OSCrypt did not initialize a backend.
Error [1219/142740.736914:VERBOSE1:first_party_sets_handler_impl.cc(279)] Empty path. Failed initializing First-Party Sets database.
Error [1219/142740.737031:WARNING:bluez_dbus_manager.cc(247)] Floss manager not present, cannot set Floss enable/disable.
Error [1219/142740.738805:VERBOSE1:va_stubs.cc(734)] dlopen(libva.so.2) failed.
Error [1219/142740.738933:VERBOSE1:va_stubs.cc(736)] dlerror() says:
Error libva.so.2: cannot open shared object file: No such file or directory
Error [1219/142740.739245:ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, ANGLE is 
Error [1219/142740.739433:VERBOSE1:simple_index_file.cc(600)] Simple Cache Index is being restored from disk.
Error [1219/142740.739468:VERBOSE1:simple_index_file.cc(600)] Simple Cache Index is being restored from disk.
Error [1219/142740.767058:VERBOSE1:media_stream_manager.cc(1100)] RFAOSF::Core() [process_id=4, frame_id=1]
Error [1219/142740.769398:VERBOSE1:media_stream_manager.cc(1100)] RFAOSF::Core() [process_id=4, frame_id=1]
Error [1219/142740.772248:VERBOSE1:configured_proxy_resolution_service.cc(803)] PAC support disabled because there is no system implementation
Error [1219/142740.772609:VERBOSE1:configured_proxy_resolution_service.cc(803)] PAC support disabled because there is no system implementation
start dispose
kill
wait
Error 
Data 
dispose
end dispose
...

I'll build a .NET dll for .NET 7 that @JaroslavMajera can use to verify this fixes the issue.

@JaroslavMajera
Copy link

You can omit errors. They are quite "normal" from headless chromium. The reproducer tries to start chromium in cycle and just kill/stop chromium and again and again. Failure is pretty random. Sometimes it fails after 10 seconds and sometimes it takes minutes. The most important part from logs are start dispose to end dispose.

@tmds
Copy link
Member Author

tmds commented Dec 19, 2022

Can you try using the System.Diagnostics.Process.dll from
Process.dll.tar.gz?
For a framework dependent app, you need to overwrite the one from the dotnet installation at ./shared/Microsoft.NETCore.App/7.0.0/System.Diagnostics.Process.dll.

@JaroslavMajera
Copy link

I am already on holiday. I will try to test it as soon as possible. Is it ok?

@danmoseley
Copy link
Member

I think you meant @adamsitnik

@tmds
Copy link
Member Author

tmds commented Dec 19, 2022

I am already on holiday. I will try to test it as soon as possible. Is it ok?

Sure, when you have time for it.
I may have another go at it myself, but so far the reproducer did not reproduce for me. Maybe I just need to wait longer.

@tmds
Copy link
Member Author

tmds commented Dec 21, 2022

Failure is pretty random. Sometimes it fails after 10 seconds and sometimes it takes minutes.

I tried again, and it occurred several times within 1-2 minutes after starting the container.

With the fix, the container ran continuously for 50 minutes (and then I terminated it manually).

@dotnet/area-system-diagnostics-process this is up for review.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, big thanks for your help and the fix @tmds!

@adamsitnik adamsitnik merged commit c5c56a6 into dotnet:main Dec 23, 2022
@adamsitnik
Copy link
Member

@jeffhandley @carlossanlop I would like to backport this fix to 7.0, but after waiting a couple of weeks (to test it). What is the deadline for next 7.0 patch release?

@jeffhandley
Copy link
Member

We will have a servicing window January 10-16.

@tmds
Copy link
Member Author

tmds commented Jan 10, 2023

@adamsitnik will you start a backport PR?

@adamsitnik
Copy link
Member

/backport to release/7.0

@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3884447954

@adamsitnik
Copy link
Member

The backport PR is ready for Tactics review (#80433, label applied, approved, email sent).

@ghost ghost locked as resolved and limited conversation to collaborators Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Process community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.NET 7 - Process terminated. Error while reaping child. errno = 10
5 participants