Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importlib.reload is not thread-safe #126548

Open
bswck opened this issue Nov 7, 2024 · 9 comments
Open

importlib.reload is not thread-safe #126548

bswck opened this issue Nov 7, 2024 · 9 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-importlib type-bug An unexpected behavior, bug, or error

Comments

@bswck
Copy link
Contributor

bswck commented Nov 7, 2024

Bug report

Bug description:

There are two problems with importlib.reload regarding thread safety, and both are race conditions.
I naturally assume the function should be thread-safe, consistently with other components of importlib.
I never ran into someone reloading the same module from different threads, but I can imagine such a scenario happen in a large, self-aware codebase.

Problem 1: Race condition when the same module is reloaded from two threads simultaneously

This guard in importlib.reload is probably supposed to do two things:

if name in _RELOADING:
return _RELOADING[name]
_RELOADING[name] = module

  1. If a module tries to reload itself (before _bootstrap._exec finishes execution), return it immediately.
  2. If two or more threads try to reload one module at a time, only let one of them through and return still-reloaded module to the other threads. In other words, make other threads access the not-yet-reloaded module eagerly, while it is being reloaded by the first thread to have initiated the reload.

Assuming (2) is what the snippet is supposed to do, I'm getting consistent failures to fulfill that goal.

To reproduce:

# ./repro.py
import importlib
import threading

import reloadee

counter = 0
reloading = True
threading.Thread(target=importlib.reload, args=(reloadee,)).start()
threading.Thread(target=importlib.reload, args=(reloadee,)).start()
# ./reloadee.py
import __main__

if getattr(__main__, "reloading", False):
    count = __main__.counter = __main__.counter + 1
    if count > 1:
        print(f"race condition: reloaded again")

Place the above files in the current working directory.

If python repro.py (test on whichever supported Python version you wish) outputs race condition: reloaded again, the repro script hit a race condition, because the module reloadee was reloaded more than once, by separate threads.
If it outputs nothing, it's a case where the race condition was missed.

Consequently, running

(for _ in {1..100}; do python repro.py; done) | wc -l

consistently yields numbers in range 80-90 on my personal machine.

Note

There is no case of module contents being executed in parallel, because _bootstrap._exec executes only one module at a time (thanks to _ModuleLockManager):

def _exec(spec, module):
"""Execute the spec's specified module in an existing module's namespace."""
name = spec.name
with _ModuleLockManager(name):

Problem 2: Classic LBYL race condition

Because the current implementation doesn't synchronize _RELOADING access, it is easy to see that between those lines

if name in _RELOADING:
return _RELOADING[name]

the reloaded module (_RELOADING[name]) can disappear from _RELOADING, in case a different thread executes

del _RELOADING[name]

I think you can imagine this being pretty likely to happen either :)
It is nicely explained in the glossary.
But let me know if we still need a repro for that one too.

Suggested solution

A simple RLock() (not Lock(), see point 1. in Problem 1 – same thread that invoked reload can re-run it, and thus should re-enter the lock without being blocked) only around the critical section

if name in _RELOADING:
return _RELOADING[name]
_RELOADING[name] = module

should suffice.

Please let me know what you think about that suggestion. I'm wondering how to test this future fix specifically – I tend to like threading.Barrier and time.sleep to ensure 100% reproducibility of cases, but maybe there's a better way?

Expect a PR from me as soon as the decision (fix / document thread-unsafety) is made!

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

Linux, macOS, Windows

@bswck bswck added the type-bug An unexpected behavior, bug, or error label Nov 7, 2024
@bswck
Copy link
Contributor Author

bswck commented Nov 7, 2024

Quick note: if reload is not intended to be thread-safe, we can skip the fix and instead document its thread-unsafety.

@ZeroIntensity ZeroIntensity added 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 7, 2024
@ZeroIntensity
Copy link
Member

Hmm, how does this affect <3.13? Threads shouldn't be able to execute and trigger the race condition because of the GIL, so this should theoretically only exist on free-threaded builds, which are only available on 3.13+

@brettcannon brettcannon removed the 3.12 bugs and security fixes label Nov 7, 2024
@brettcannon
Copy link
Member

brettcannon commented Nov 7, 2024

I'm of two minds on this one. First, importlib.reload() is really only meant to be used in the REPL when you're editing code live. Second, I'm sure some people are doing "interesting" things with the function, so not having it be broken would be good.

I think that importlib is on the hot path for interpreter start-up and some folks have tried to make it load faster, so I'm not sure if importing threading for the lock is going to make, e.g., @vstinner grumpy since I think he did an import tweak previously to minimize imports. So we may need the import threading call to be local to the function.

As for testing, we tend to dislike sleeps since it holds all the testing up. Honestly, the logic behind the lock is simple enough that just passing the test suite as-is should be fine.

And I agree with @ZeroIntensity that this doesn't seem to be a concern prior to Python 3.13 (hence why I dropped 3.12 from the effected versions).

@ZeroIntensity
Copy link
Member

Well, I was able to get the race condition: reloaded again message to appear on 3.12, but I'm not sure that's a bug. It seems normal to me that threads might run in the middle of reloading, because the GIL is temporarily released during the creation of threads, allowing them to reload before everything is up and running.

What I am worried about is the error message that can come up on the free-threaded builds:

ImportError: module reloadee not in sys.modules

Anyways, we need some locking around it. @bswck said he would submit a PR, so we can wait on that 😄. In the meantime, I'd be curious to hear if @ncoghlan has any input on this, because she had a thread on discord last week about importlib thread safety.

@colesbury
Copy link
Contributor

A few thoughts:

  • This is not free threading specific, including the ImportError: module reloadee not in sys.modules. (I reproduced that on 3.9.10 on macOS)
  • It seems fine to me if importlib.reload() is not thread-safe and only intended for REPL usage. It also seems okay to me to make it thread-safe.

I'd be a bit hesitant about the proposed RLock approach because:

  • It would still sometimes return a non-reloaded module if importlib.reload() is called from multiple threads. I don't think that counts as thread-safe.
  • It's prone to lock ordering deadlocks in some circumstances. The _ModuleLockManager makes substantial effort to avoid lock ordering deadlocks; if we want importlib.reload() to be similarly robust it might have to be similarly complex.

@ZeroIntensity
Copy link
Member

  • This is not free threading specific, including the ImportError: module reloadee not in sys.modules. (I reproduced that on 3.9.10 on macOS)

Odd, I can't reproduce that error on a GILful build at all, even with thousands of threads. I guess my system is just being silly.

@bswck
Copy link
Contributor Author

bswck commented Nov 8, 2024

Thanks for the invaluable input.

First, importlib.reload() is really only meant to be used in the REPL when you're editing code live.

Makes sense. It is indeed what the docs highlight: This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. However, what we could do if we lean toward not making importlib.reload thread-safe is highlighting the primary intent even more, introducing keywords such as "only" and "not guaranteed", as in sys._getframe docs.

This is not free threading specific, including the ImportError: module reloadee not in sys.modules. (I reproduced that on 3.9.10 on macOS)

Yes! That's what I forgot to underline while writing this. It targets every supported version.

The behavior we'd expect of reload in a multithreaded scenario isn't as it is now, whether the GIL is or isn't enabled: it can't be thread 2 immediately interferes with the unfinished state of _RELOADING from thread 1. This is the primary concern reported in the issue.

I never reproduced ImportError: module reloadee not in sys.modules, but I can guess the reason for this race condition (that I overlooked) occuring; I never discovered it most likely because I had never tested this issue on more than 2 threads—from reading through the code, I suspect the reason to be that some thread runs

if sys.modules.get(name) is not module:
raise ImportError(f"module {name} not in sys.modules", name=name)

just between these two statements carried out by some other thread:

module = sys.modules.pop(spec.name)
sys.modules[spec.name] = module

which is quite a time distance and is much more likely to occur with more threads involved.
Hence, this is a GIL-agnostic case.

It seems fine to me if importlib.reload() is not thread-safe and only intended for REPL usage. It also seems okay to me to make it thread-safe.

I'd personally enjoy it being thread-safe, but my personal joy doesn't necessarily align to optimal practical implications (potential performance degradation, code complication for a case that's never been reported in now more than 10 years of existence). Thank you @jaraco for hinting me in that matter.

It would still sometimes return a non-reloaded module if importlib.reload() is called from multiple threads. I don't think that counts as thread-safe.

Great point, and a mistake in my initial assumption that the function should "make other threads access the not-yet-reloaded module eagerly, while it is being reloaded by the first thread to have initiated the reload".
What I'd then expect in a multi-threaded scenario is that the returned module's state is ready, which already feels like a burden for a use case that's theoretical.

After all, I'm leaning toward giving up on introducing new lock mechanisms for navigating this and instead documenting the intent that this function has had for years.

If this issue gathers interest in the future and proves necessary to make importlib.reload thread-safe for a rational use case, necessary effort may be put into making importlib.reload thread-safe (where that use case, let's be honest, is hard to imagine).
(To emphasize that, just know that I found this just by reading the code of importlib, completely out of any practical context).

So, what option (1. emphasize REPL-only intent, 2. make thread-safe) would you be more inclined to from a clearly practical point of view? Can you imagine a use case where importlib.reload is proven useful for multi-threaded scenarios? @brettcannon @colesbury @ncoghlan @Eclips4 @ZeroIntensity

@ZeroIntensity ZeroIntensity added the 3.12 bugs and security fixes label Nov 8, 2024
@brettcannon
Copy link
Member

I say just make the docs stronger around what people's expectations should be, but if you want to give it a go to fix and the PR is simple enough without slowing down importing we can at least consider it.

@bswck
Copy link
Contributor Author

bswck commented Nov 18, 2024

but if you want to give it a go to fix and the PR is simple enough without slowing down importing we can at least consider it.

I will definitely learn a lot from giving a shot to implementing a fix, so let's go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-importlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants