-
Notifications
You must be signed in to change notification settings - Fork 6
2 Open Questions
The idea is to introduce a mode in which using a subinterpreter is less perilous. This mode could be optional for subinterpreters (at the C level), though it's probably not worth it.
- disallow threading
- disallow forking (already done regardless)
- eliminate (or ignore) the GIL within each subinterpreter
- start the subinterpreter in its own thread (and pin it there)
- (maybe) add mode management
From email to @emilydmorehouse on 14 July 2018:
I've thinking through what the "main" thread is for a given
interpreter. I have some ideas and wanted to share them before I get
too far into it.
Here's a summary of what I'm imagining:
* there is only one "main" thread: the one where the main interpreter
started running (same as now)
* we add a "_PyRuntimeState.main_thread" field to record this
* each interpreter has an "active" thread where pending calls are made
* the "active" thread of the main interpreter will always be
"_PyRuntimeState.main_thread"
"active" isn't necessarily the right description for the thread, so
I'm open to better names. It's better than my initial choice,
"manager". :)
Based on the idea of an "active" thread, here are some options:
1. each interpreter runs pending calls in its active thread
* this is a continuation of what we did Friday
* we rename "PyInterpreterState.ceval.pending.main_thread" to
"active_thread" (hence pending calls will be made in the active thread
only)
* operations on "active_thread" would be protected by a lock
* we do not initialize "active_thread" to the thread where the
interpreter was created
* instead, we initialize "active_thread" to something like
"_PyEval_NO_ACTIVE_THREAD" (probably use a value of -1)
* when the eval loop is run (i.e. when PyEval_EvalFrameEx is
called), we immediately set "active_thread" if it isn't set already
* when the eval loop completes we set "active_thread" to another
thread where the interpreter has an eval loop already running (if any)
or to _PyEval_NO_ACTIVE_THREAD
* when an interpreter is finalized it won't be running so
"active_thread" will necessarily be _PyEval_NO_ACTIVE_THREAD
* at that point if there are any pending calls then we create a
new thread under the interpreter and make the pending calls there
2. like #1, but at finalization we:
* create a new PyThreadState to use for the pending calls
* temporarily commandeer the active thread of another interpreter
(we're guaranteed that the main interpreter will have an active
thread)
* switch to the new thread state and make the pending calls
* switch back to running the original interpreter
3. like #1, but we guarantee that pending calls are always made "soon"
* in our new _Py_Add_Pending_Call() we check if "active_thread" is
set to _PyEval_NO_ACTIVE_THREAD
* if it is then we take the approach from the last point of #1
(finalization) and immediate spin up a new thread to make the pending
calls
4. like #3, but we take the approach from #2 and temporarily
commandeer the active thread of another interpreter to make the
pending calls
* at the point, since we've already taken the hit by creating that
thread state we should consider preserving it to use again in the same
situation
Observations:
* in the case of #1 (and #2), if an interpreter has no active thread
then pending calls will never be made until interpreter finalization
* it may be worth keeping a list of all active thread IDs for each interpreter
* an interpreter (A) may be run in a thread where another interpreter
(B) is already running, in effect interrupting that first interpreter
+ this is how PEP 554 operates
+ when that happens, interpreter A will have the same active thread
as the interrupted interpreter B (assuming no other threads are
running)
* exceptions that bubble out of a made pending call will surface in a
different thread from where the pending call was made
+ this will likely confuse Python users
+ we should not expect this to happen often (if ever)
* always commandeering the main thread (in the main interpreter), in
#2, would simplify matters, but may be disruptive to performance of
that thread (probably not enough to worry about though)
If we take an "active" thread approach then our next steps for our
"pending calls" patch would look like this:
1. move the pending calls state to PyInterpreterState [mostly done]
a. fix PyEval_ReInitThreads() (may be unnecessary due to "active" threads)
2. pass a PyInterpreterState around in ceval.c (and ceval_gil.h)
instead of repeatedly calling PyThreadState_Get()
3-(N-1). switch to making pending calls in the "active" thread
N. interpreter refcount
What do you think?
...
- freelists
- caches
- static types (& declarations, e.g. methods, structseq fields)
- these do not change under normal operation?
- wrap per interpreter? resolve via a basic form of COW?
- singletons (
None
,Ellipsis
, etc.) - interned objects
_Py_IDENTIFIER
- ...
Also see: https://docs.python.org/3.5/c-api/init.html#process-wide-parameters
block them? hide them? make them read-only?
How handled in other languages that support multiple runtimes in a process? (lua? .net application domains?)
https://msdn.microsoft.com/en-us/library/2bh4z9hs(v=vs.110).aspx
- open file desciptors
- env vars
- memory (e.g. malloc, Python allocators, etc.)
- store pointer
- good: simpler, faster
- bad: crash if interpreter already destroyed
- requires a per-interpreter refcount
- store ID
- good: safer
- bad: failure when interpreter already destroyed happens at a distance from where interpreter recorded
options:
- keep global state (in
PyRuntimeState
) along with per-interpreter - in special cases, treat the state of main interpreter as global
- tracing
- as a global default
- ? warnings
- it would have to be global (not main)
- what would global "warnings" config mean? default?
- ? atexit
- set on main interpreter at C level (no objects owned by subinterpreters can be involved)
- pending calls
- main interpreter already used for signal handlers
- "global" also means "any interpreter"? (not likely)
- ? gc / allocators / etc.
- do we need global memory mgmt. for anything? (maybe)
- ? GIL
- probably not; we'll use granular locks for cross-interpreter stuff
- pending calls
- e.g. for decref of shared objects
- tracing?
- allow one interpreter to manage others
- atexit?
- allow one interpreter to manage others
- gc?
- allow one interpreter to manage others
See the previous question for context.
- move pending calls to
PyThreadState
? - also support per-thread tracing?
- should the recursion limit (see
PyRuntimeState.ceval.recursion_limit
) be per-interpreter? both (global as default)?
Signal handlers are triggered through the main interpreter. However, the registered functions should run in the interpreter where the function was created (i.e. the "owning" interpreter). This is due to the isolation requirements of subinterpreters when they do not share the GIL: objects cannot be used outside their owning interpreter.
...
- Go uses some
- fail blocking send (on channel) if no other interpreters attached (for recv)?
- likewise for recv?
In a Python with multiple active allocators, could we use pointer comparison/masking to find owning allocator?
...
...
Andrew Godwin's Django Channels proposal:
- https://gist.github.com/andrewgodwin/b3f826a879eb84a70625
- https://twitter.com/ncoghlan_dev/status/610773738439643137?t=1&cn=bWVudGlvbg%3D%3D&sig=bd26c0bf2333b51ff97e1974d6f864651118d1e7&al=1&refsrc=email&iid=14e0f010f77847a89079b6e24369297e&autoactions=1434454781&uid=439418320&nid=4+1489
Graham's thoughts:
- like Go
- Just an event loop and associated tasks?
...
...
...
- esp. C extensions
- GIL protects a little
- lack of C globals isolation between interpreters is a problem (modules imported in each)
- no namespaces :(
- https://github.com/pyca/cryptography/issues/2299
...
...
...
...