2 Open Questions

Should there be a subinterpreter "Serial Execution Mode"?

The idea is to introduce a mode in which using a subinterpreter is less perilous. This mode could be optional for subinterpreters (at the C level), though it's probably not worth it.

disallow threading
disallow forking (already done regardless)
eliminate (or ignore) the GIL within each subinterpreter
start the subinterpreter in its own thread (and pin it there)
(maybe) add mode management

What is the "main" thread of a subinterpreter?

From email to @emilydmorehouse on 14 July 2018:

I've thinking through what the "main" thread is for a given
interpreter.  I have some ideas and wanted to share them before I get
too far into it.

Here's a summary of what I'm imagining:

* there is only one "main" thread: the one where the main interpreter
started running (same as now)
* we add a "_PyRuntimeState.main_thread" field to record this
* each interpreter has an "active" thread where pending calls are made
* the "active" thread of the main interpreter will always be
"_PyRuntimeState.main_thread"

"active" isn't necessarily the right description for the thread, so
I'm open to better names.  It's better than my initial choice,
"manager". :)

Based on the idea of an "active" thread, here are some options:

1. each interpreter runs pending calls in its active thread
    * this is a continuation of what we did Friday
    * we rename "PyInterpreterState.ceval.pending.main_thread" to
"active_thread" (hence pending calls will be made in the active thread
only)
    * operations on "active_thread" would be protected by a lock
    * we do not initialize "active_thread" to the thread where the
interpreter was created
    * instead, we initialize "active_thread" to something like
"_PyEval_NO_ACTIVE_THREAD" (probably use a value of -1)
    * when the eval loop is run (i.e. when PyEval_EvalFrameEx is
called), we immediately set "active_thread" if it isn't set already
    * when the eval loop completes we set "active_thread" to another
thread where the interpreter has an eval loop already running (if any)
or to _PyEval_NO_ACTIVE_THREAD
    * when an interpreter is finalized it won't be running so
"active_thread" will necessarily be _PyEval_NO_ACTIVE_THREAD
    * at that point if there are any pending calls then we create a
new thread under the interpreter and make the pending calls there
2. like #1, but at finalization we:
    * create a new PyThreadState to use for the pending calls
    * temporarily commandeer the active thread of another interpreter
(we're guaranteed that the main interpreter will have an active
thread)
    * switch to the new thread state and make the pending calls
    * switch back to running the original interpreter
3. like #1, but we guarantee that pending calls are always made "soon"
    * in our new _Py_Add_Pending_Call() we check if "active_thread" is
set to _PyEval_NO_ACTIVE_THREAD
    * if it is then we take the approach from the last point of #1
(finalization) and immediate spin up a new thread to make the pending
calls
4. like #3, but we take the approach from #2 and temporarily
commandeer the active thread of another interpreter to make the
pending calls
    * at the point, since we've already taken the hit by creating that
thread state we should consider preserving it to use again in the same
situation

Observations:

* in the case of #1 (and #2), if an interpreter has no active thread
then pending calls will never be made until interpreter finalization
* it may be worth keeping a list of all active thread IDs for each interpreter
* an interpreter (A) may be run in a thread where another interpreter
(B) is already running, in effect interrupting that first interpreter
  + this is how PEP 554 operates
  + when that happens, interpreter A will have the same active thread
as the interrupted interpreter B (assuming no other threads are
running)
* exceptions that bubble out of a made pending call will surface in a
different thread from where the pending call was made
  + this will likely confuse Python users
  + we should not expect this to happen often (if ever)
* always commandeering the main thread (in the main interpreter), in
#2, would simplify matters, but may be disruptive to performance of
that thread (probably not enough to worry about though)

If we take an "active" thread approach then our next steps for our
"pending calls" patch would look like this:

1. move the pending calls state to PyInterpreterState [mostly done]
   a. fix PyEval_ReInitThreads()  (may be unnecessary due to "active" threads)
2. pass a PyInterpreterState around in ceval.c (and ceval_gil.h)
instead of repeatedly calling PyThreadState_Get()
3-(N-1). switch to making pending calls in the "active" thread
N. interpreter refcount

What do you think?

Where does async fit in?

...

What to do about remaining C globals?

freelists
caches
static types (& declarations, e.g. methods, structseq fields)
- these do not change under normal operation?
- wrap per interpreter? resolve via a basic form of COW?
singletons (None, Ellipsis, etc.)
interned objects
_Py_IDENTIFIER
...

Also see: https://docs.python.org/3.5/c-api/init.html#process-wide-parameters

block them? hide them? make them read-only?

How handled in other languages that support multiple runtimes in a process? (lua? .net application domains?)

https://msdn.microsoft.com/en-us/library/2bh4z9hs(v=vs.110).aspx

What to do about process-global resources?

open file desciptors
env vars
memory (e.g. malloc, Python allocators, etc.)

How to deal with tracking interpreters at a global level?

store pointer
- good: simpler, faster
- bad: crash if interpreter already destroyed
- requires a per-interpreter refcount
store ID
- good: safer
- bad: failure when interpreter already destroyed happens at a distance from where interpreter recorded

What per-interpreter state should have a global counterpart?

options:

keep global state (in PyRuntimeState) along with per-interpreter
in special cases, treat the state of main interpreter as global

tracing

as a global default

? warnings

it would have to be global (not main)
what would global "warnings" config mean? default?

? atexit

set on main interpreter at C level (no objects owned by subinterpreters can be involved)

pending calls

main interpreter already used for signal handlers
"global" also means "any interpreter"? (not likely)

? gc / allocators / etc.

do we need global memory mgmt. for anything? (maybe)

? GIL

probably not; we'll use granular locks for cross-interpreter stuff

What per-interpreter state should be accessible from other interpreters?

pending calls

e.g. for decref of shared objects

tracing?

allow one interpreter to manage others

atexit?

allow one interpreter to manage others

gc?

allow one interpreter to manage others

PEP 554: Should `Interpreter` class support manipulating part of state?

See the previous question for context.

What per-interpreter items need any further attention?

move pending calls to PyThreadState?
also support per-thread tracing?
should the recursion limit (see PyRuntimeState.ceval.recursion_limit) be per-interpreter? both (global as default)?

How to ensure a signal handler runs in owning interpreter?

Signal handlers are triggered through the main interpreter. However, the registered functions should run in the interpreter where the function was created (i.e. the "owning" interpreter). This is due to the isolation requirements of subinterpreters when they do not share the GIL: objects cannot be used outside their owning interpreter.

Would it ever make sense for one interpreter to acquire another's GIL?

...

Are there any deadlock avoidance techniques worth pursuing?

Go uses some
fail blocking send (on channel) if no other interpreters attached (for recv)?
likewise for recv?

In a Python with multiple active allocators, could we use pointer comparison/masking to find owning allocator?

...

benchmarking?

...

relationship with async web (i.e. HTTP/2)?

Andrew Godwin's Django Channels proposal:

Graham's thoughts:

https://github.com/python-web-sig/wsgi-ng/issues/12

Introduce PYTHON_MAX_CORES?

like Go

What to run in subinterpreters?

Just an event loop and associated tasks?

Follow the lead of Go? (coroutines + threads when blocking)

...

how do coroutines fit in?

...

impact on single-threaded performance

...

Issues with "multiple interpreters, single process" + C globals?

esp. C extensions
GIL protects a little
lack of C globals isolation between interpreters is a problem (modules imported in each)
- no namespaces :(
https://github.com/pyca/cryptography/issues/2299

sys modules is for main? current? all?

...

How to handle closures? globals?

...

ctypes/cffi + subinterpreters?

...

Impact on profiling/tracing?

...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 Open Questions

Should there be a subinterpreter "Serial Execution Mode"?

What is the "main" thread of a subinterpreter?

Where does async fit in?

What to do about remaining C globals?

What to do about process-global resources?

How to deal with tracking interpreters at a global level?

What per-interpreter state should have a global counterpart?

What per-interpreter state should be accessible from other interpreters?

PEP 554: Should `Interpreter` class support manipulating part of state?

What per-interpreter items need any further attention?

How to ensure a signal handler runs in owning interpreter?

Would it ever make sense for one interpreter to acquire another's GIL?

Are there any deadlock avoidance techniques worth pursuing?

In a Python with multiple active allocators, could we use pointer comparison/masking to find owning allocator?

benchmarking?

relationship with async web (i.e. HTTP/2)?

Introduce PYTHON_MAX_CORES?

What to run in subinterpreters?

Follow the lead of Go? (coroutines + threads when blocking)

how do coroutines fit in?

impact on single-threaded performance

Issues with "multiple interpreters, single process" + C globals?

sys modules is for main? current? all?

How to handle closures? globals?

ctypes/cffi + subinterpreters?

Impact on profiling/tracing?

Clone this wiki locally

2 Open Questions

Should there be a subinterpreter "Serial Execution Mode"?

What is the "main" thread of a subinterpreter?

Where does async fit in?

What to do about remaining C globals?

What to do about process-global resources?

How to deal with tracking interpreters at a global level?

What per-interpreter state should have a global counterpart?

What per-interpreter state should be accessible from other interpreters?

PEP 554: Should Interpreter class support manipulating part of state?

What per-interpreter items need any further attention?

How to ensure a signal handler runs in owning interpreter?

Would it ever make sense for one interpreter to acquire another's GIL?

Are there any deadlock avoidance techniques worth pursuing?

In a Python with multiple active allocators, could we use pointer comparison/masking to find owning allocator?

benchmarking?

relationship with async web (i.e. HTTP/2)?

Introduce PYTHON_MAX_CORES?

What to run in subinterpreters?

Follow the lead of Go? (coroutines + threads when blocking)

how do coroutines fit in?

impact on single-threaded performance

Issues with "multiple interpreters, single process" + C globals?

sys modules is for main? current? all?

How to handle closures? globals?

ctypes/cffi + subinterpreters?

Impact on profiling/tracing?

Clone this wiki locally

PEP 554: Should `Interpreter` class support manipulating part of state?