Skip to content

bpo-45021: Fix a hang in forked children #28007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 20, 2021
Merged

bpo-45021: Fix a hang in forked children #28007

merged 4 commits into from
Sep 20, 2021

Conversation

0x0L
Copy link
Contributor

@0x0L 0x0L commented Aug 27, 2021

_global_shutdown_lock should be reinitialized in forked children

https://bugs.python.org/issue45021

_global_shutdown_lock should be reinitialized in forked children
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@0x0L

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@pitrou
Copy link
Member

pitrou commented Aug 31, 2021

It would be nice to add a simple test for this.

@pitrou pitrou requested a review from gpshead August 31, 2021 12:24
@0x0L
Copy link
Contributor Author

0x0L commented Aug 31, 2021

I'm not too sure how to test for a freeze: should I just put a minimal reproducer somewhere and let it freeze if run on unpatched version ?

@pitrou
Copy link
Member

pitrou commented Aug 31, 2021

Yes, that would work.

@0x0L 0x0L requested a review from vstinner September 13, 2021 17:24
@ambv ambv added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 15, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ambv for commit 02891f9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 15, 2021
@0x0L
Copy link
Contributor Author

0x0L commented Sep 18, 2021

Build failures look unrelated to this patch

@vstinner
Copy link
Member

buildbot/AMD64 Arch Linux Asan Debug PR

Unrelated and already fixed by: python/buildmaster-config@576b13d

buildbot/x86-64 macOS PR

Unrelated and known issue: https://bugs.python.org/issue43592

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @pitrou: Would you mind to double check the fix?

@gpshead gpshead added needs backport to 3.10 only security fixes type-bug An unexpected behavior, bug, or error needs backport to 3.9 only security fixes labels Sep 20, 2021
@gpshead
Copy link
Member

gpshead commented Sep 20, 2021

because this test is mixing fork and threads which is a posix-nono it will probably run into hanging problems of its own on some platforms or build setups. If/When that happens, we can conditionally enable it only for a a known system and build configs it happens to not hang on. merging anyways, we'll untangle that when it happens.

@gpshead gpshead merged commit 0bfa110 into python:main Sep 20, 2021
@miss-islington
Copy link
Contributor

Thanks @0x0L for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-28480 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed needs backport to 3.10 only security fixes needs backport to 3.9 only security fixes labels Sep 20, 2021
@bedevere-bot
Copy link

GH-28481 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)

Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>
@vstinner
Copy link
Member

I'm happy to see my Lock._at_fork_reinit() method being used :-) https://bugs.python.org/issue40089

@vstinner
Copy link
Member

Maybe at some point, we should consider to make the method public?

miss-islington added a commit that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>

Automerge-Triggered-By: GH:gpshead
miss-islington added a commit that referenced this pull request Sep 20, 2021
_global_shutdown_lock should be reinitialized in forked children
(cherry picked from commit 0bfa110)


Co-authored-by: nullptr <3621629+0x0L@users.noreply.github.com>

Automerge-Triggered-By: GH:gpshead
niyas-sait pushed a commit to niyas-sait/cpython that referenced this pull request Sep 21, 2021
_global_shutdown_lock should be reinitialized in forked children
@0x0L
Copy link
Contributor Author

0x0L commented Sep 28, 2021

@vstinner @gpshead
It looks like there's an issue with this patch. Running the following in on an unpatched python

import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures.thread import _global_shutdown_lock

if hasattr(os, 'register_at_fork'):
    os.register_at_fork(before=_global_shutdown_lock.acquire,
                        after_in_child=_global_shutdown_lock._at_fork_reinit,
                        ​after_in_parent=_global_shutdown_lock.release)

with ProcessPoolExecutor() as pool:
    r = list(pool.map(str.upper, ['a']))

fails in some env.... Oddly I have two conda env with the exact same hash for the python package and this example hangs up in one env and not the other.

Calling os.register_at_fork only with the after_in_child looks ok and actually seems more appropriate.
0x0L@dc2d0c9 passes the test

Any thoughts ? Should I open a new issue or can we reopen this one ?

@gpshead
Copy link
Member

gpshead commented Sep 28, 2021

I cannot reproduce any hang with that snippet on such a python revision.

While not technically necessary for this issue bug "fix" (quotes because all use of threading+fork in one process is a bad idea), It still seems right to grab the lock across the fork, otherwise the forked child could be in a strange state w.r.t. the locks protected resources.

@0x0L
Copy link
Contributor Author

0x0L commented Sep 29, 2021

I also struggle to reproduce it. It only happens - consistently though - in a single one of my envs.
I'll try to find some time to attach a gdb and investigate a bit more.

@0x0L
Copy link
Contributor Author

0x0L commented Sep 30, 2021

:) Actually the python env where it didn't work was already patched by hand... Everything's look ok.
My apologies

@vstinner
Copy link
Member

Don't worry, it happens to everybody. Thanks for double checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants