-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-105699: Use a Linked List for the PyModuleDef Cache #106899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105699: Use a Linked List for the PyModuleDef Cache #106899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the assumption that a linked list is good enough because the number of extensions is small is actually warranted, and I think we should definitely double-check the performance on systems with non-trivial numbers of modules. Not using a dict for the cache avoids some problems, but replacing it with a linear search is very likely to have a bad performance hit.
Since the problem being fixed only manifests itself with isolated subinterpreters, I would rather live with the race in 3.12 than rush in a big performance regression, even if it's just for imports of extension modules.
_PyInterpreterState_Main()); | ||
/* The lock is initialized directly with the general runtime state. */ | ||
assert(EXTENSIONS.mutex != NULL); | ||
//assert(EXTENSIONS.head == NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover line I presume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. I had expected the assert to work (and it would in a simpler world). I was going to circle back to it but, having slept on it, don't see the need. It can be dropped.
@@ -919,167 +919,134 @@ extensions_lock_release(void) | |||
static void | |||
_extensions_cache_init(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this function at all, now that it's just a single assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I figured I'd end up finding a better solution than a linked list that would revive the need for this function, so I kept it around. The point is a bit moot now though. 😄
goto finally; | ||
} | ||
strcpy(cached->filename, PyUnicode_AsUTF8(filename)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use strncpy to avoid the deprecation in some compilers (macOS IIRC?)
} | ||
res = 0; | ||
|
||
/* Destroy the cache data. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this DECREF found->def
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Understood. I was looking for a quick fix and mostly played a hunch that the performance impact would be minimal. I don't think I'm wrong but agree there isn't enough time to properly validate my hypothesis. Thanks for speaking up. I do plan on finding an alternative that doesn't rely on a Python object, to address the crasher. |
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? <wink/>
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a linked list, using the "raw" allocator to avoid tying it to the main interpreter. We assume that there will be few enough legacy extension modules loaded that the O(n) operations won't cause too much overhead.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.