-
Notifications
You must be signed in to change notification settings - Fork 6
1 What's the Point?
There are 2 main things that this project is trying to solve:
- improve Python's multi-core story
- fill the gap in supported concurrency models
The proposed solution will resolve both but requires effort and expands the language a little. So it's important that there is a clear justification. Why is it worth it?
Benefits
Community Support and Feedback
Concerns / Challenges
More on Benefits
Here's a summary (further elaboration below):
- true multi-core parallelism in CPython
- human-friendly concurrency model
- combines the benefits of threads (in-process) and multiprocessing (isolation)
- prerequisites provide many secondary improvements
- most are worth doing regardless
- actual GIL-focused effort is relatively small
- enables other runtime/language improvements
- provides a low-level, thread-safe, in-process serialization mechanism for Python objects
- does not require substantial changes to CPython's code (subinterpreters already exist in C-API)
- does not break backward compatibility for extension modules
- does not negatively impact performance
- runtime improvements should actually improve single-threaded performance
- subinterpreters have been part of the CPython C-API for over 20 years
- we're not ripping the feature out at this point
- serious projects are starting to use them
- current users of the subinterpreters C-API will benefit from a stdlib module
- easier for debuggers to handle than multiple processes
Further details may also be found in PEP 554.
Since I started this project in 2014 (at the encouragement of Nick Coghlan), I've been confident that it's something people want. However, I hadn't really solicited feedback/support until recently (other than on a few mailing list threads).
The lack of feedback has changed since the core sprint at Facebook, September 2017. The ball has been rolling since then, including the creation of PEP 554. At the 2018 language summit I presented on this project and the writeup showed up on LWN. Throughout all 9 days of PyCon 2018 I made a concerted effort to explain the project to as many people as possible and ask for their feedback. I spoke with many folks, including Guido, Davin Potts, Travis Oliphant, Jake VanderPlas, Nathaniel Smith, Lukasz Langa, and other core devs, plus various folks involved in web frameworks. The response was overwhelmingly positive. Many folks were visibly excited by the idea. Also, at this point several people are helping me on the project (most notably Emily Morehouse).
The only negative reaction I've gotten from the Python community has been from 3 individuals: Guido, Nathaniel Smith, and Sturla Molden. Guido was was unconvinced that subinterpreters provided any improvement over multiprocessing and worried about complexity. Nathaniel echoed those concerns, worried about the extra burden on extension authors, and doubted the goals of the project could be achieved without breaking compatibility. Sturla advocated strongly for multiprocessing (fork w/ COW).
In all honesty, I have serious reservations about the future of this project if Guido doesn't support the idea. He has language design superpowers! So even if he doesn't call the shots anymore, I want Guido's support. This page is, for better or worse, the centerpiece of the effort to make the case (to Guido and anyone else) for the value of this project.
Aside from some of the benefits I've noted above (and concerns below), here are the most interesting bits of feedback I received at PyCon:
- subinterpreters is a more elegant API for concurrency than the others
- threads vs. subinterpreters is the same argument as threads vs. procs
- what is the future of multiprocessing?
- likely that few extension modules will break
- burden on extension module authors is a red herring
- the community will get more excited once there's something they can try out
There are still a number of individuals with whom I'd like to speak regarding how they could make use of subinterpreters in their area of expertise. In the area of numerical/scientific computing I'm hoping to speak with Matt Rocklin (Dask). Most notably I need to get more feedback from web framework folks, e.g. Andrew Godwin.
- someone has to do the work :)
- adds cognitive burden to Python users
- some might see it as yet-another-concurrency-model-to-understand
- adds to Python's maintenance burden
- cost of per-interpreter state
- subinterpreter startup is relatively expensive vs. multiprocessing (fork + COW)
- per-interpreter state is relatively expensive vs. threads/multiprocessing
- no segfault isolation
- why not keep using threads/multiprocessing/async/etc.?
- could it break embedders that rely on GIL?
- won't subinterpreters add extra burden on extension module authors?
- subinterpreters add unnecessary complexity to the CPython implementation?
- why not "just" get rid of the GIL?
Rebuttals:
- I'm doing the work, albeit slowly, and others have volunteered to help.
- Every new feature adds to the cognitive burden on Python users. This project does not add much and the benefits to Python users are significant.
- This is also true of every new feature. Furthermore, the net change in maintenance burden should actually be a reduction since a lot of the work involved brings consistency to the runtime code. PEP 554 will help normalize use of subinterpreters in the test suite. Finally, most of this project will involve work that should be done regardless. The remainder to actually facilitate multi-core subinterpreters is quite small.
- This will need to be addressed. However, the extra costs involved are relatively small and should not impact most uses of subinterpreters.
- It's unclear where this would actually be a problem in practice. However, dealing with segfaulting processes isn't a problem that's unique to subinterpreters...
- See "combines the benefits of threads and multiprocessing" below for a direct comparision. Also see "Concurrency Models" for a general look at the pros and cons of each.
- I'd expect this to not be a problem. We will likely still have a GIL for some operations. Furthermore, if an embedder is using a single interpreter then nothing will change for them.
- It shouldn't. Most extension modules will be unaffected by this project. For those that are affected, there are a number of possible ways that we can avoid the problem. Most significantly, extension module authors can indicate that they do not support subinterpreters.
- That ship sailed years ago. We really no longer in a position to remove subinterpreters from the C-API. Furthermore they are being used more frequently, including in major projects. At this point we should fix broken corner cases, close gaps, and provide a Python API.
- Good luck with that. :)
This will help resolve concerns in the broader software community about Python's place in a multi-core world. This is achieved through a mechanism (subinterpreters) that is easy to use and to reason about.
The model proposed in PEP 554 is inspired in large part by CSP (Hoare's "Communicating Sequential Processes", the inspiration for Go's concurrency model). This works well for Python because subinterpreters map very well to CSP.
Concurrency through CSP (and message-passing in general) is a fairly well-studied topic in computer science. Certain aspects of the model lend themselves well to meaningful formal verification techniques. It's highly likely that these can be applied to the use of subinterpreters.
Prior art for CSP-derived concurrency models includes:
- golang
- JS web workers
- ...
See "Concurrency Models" for more on the strengths and weaknesses of the different Python concurrency models.
- threads: in-process
- multiprocessing: isolation
Threads are relatively efficient because they can share in-process resources, however their share-everything nature can cause problems (and CPython has the GIL). Multiprocessing provides useful isolation between processes (and can leverage multiple cores), at the expense of some inefficiency and extra complexity. (See "Concurrency Models".)
Subinterpreters essentially combine the benefits of both while minimizing the downsides. Because they are in-process like threads they avoid a lot of the inefficiency of multiprocessing. Because they are isolated (execution-wise) like multiprocessing they avoid the problems of thread-safety. The relative complexity of subinterpreters is greatly reduced by a simple mechanism for explicitly sharing data/objects (optionally at a synchronization point).
Subinterpreters aren't quite as efficient as threads nor exactly as isolated as multiprocessing. However, they're close enough to make them extremely useful.
Most the work involved for this project centers on the runtime (and the existing subinterpreters C-API):
- clean up global runtime state
- resolve known issues with runtime initialization & finalization
- improve isolation of subinterpreters
- fix subinterpreter bugs
- help extension modules be more self-contained
The whole effort may be further broken down into a number of granular tasks). The majority of those tasks provide improvements that we want regardless of how they might facilitate the objectives of this project. The remaining few tasks are those that move the GIL to per-interpreter state.
Consequently, a significant side effect of this project is getting all the benefits of the secondary tasks:
- better runtime speed
- lower startup time
- less memory usage
- more consistency in runtime, improving maintainability
- better discoverability in runtime code, improving maintainability
- improved isolation in runtime gives embedders fewer problems
- helps extension modules be more isolated/self-contained
- enables other runtime/language improvements
The problem being solved here (for "message"-passing) is that of efficiently and safely sharing data (or even objects) between otherwise isolated threads-of-execution. That's necessary for this project because to stop sharing the GIL subinterpreters cannot share any objects (mostly).
For the proposed solution, PEP 554 introduces "channels", which are inspired by CSP and (somewhat) Go. However, any approach taken to solve the underlying "sharing" problem should be useful in other problem spaces just as well as for subinterpreters (e.g. multi-process computing, traditional threads).
Subinterpreters have existed in the C-API for over 20 years. This project is using that functionality and requires the following changes on top:
- clean up the global runtime state (mostly already done)
- move parts of the runtime state into the interpreter state (including the GIL)
Nearly all relevant changes are isolated to the runtime-related code. For the rest it's mostly a matter of changing references-to-the-runtime-state to references-to-the-current-interpreter-state. Very little actual logic must change.
Extension modules will continue to work under a single interpreter without requiring any changes. Likewise, when used under multiple interpreters very little will change: extensions that currently work should continue working without needing changes.
...
While subinterpreters have been a part of the C-API for over 2 decades, they haven't gotten a lot of attention from CPython core developers. This is due to a number of reasons:
- not a lot of known projects have used the feature (until recently, only mod_wsgi)
- the relevant C-API isn't well known (nor well described)
- the feature isn't as well tested as the rest of CPython and has some bugs (catch 22: since it isn't so well known)
- the C-API docs make subinterpreters sound scary
- not many core developers are familiar enough with the runtime code to dive in
- some extension modules don't work right under subinterpreters
However, we are not going to simply remove the feature from Python because:
- the relevant code reflects the most sensible structure for managing the interpreter
- it's a longstanding part of the public C-API (and stable ABI)
- serious projects are already using them
- mod_wsgi
- OpenStack Ceph
- JEP
- ...
- usage has been increasing in the last several years
Consequently, if folks are using subinterpreters then:
- we should ensure it works correctly
- subinterpreters C-API users would likely find a corresponding stdlib module useful
For reference, here's a rough look at who's using subinterpreters in the broader community (at any scale):
- https://github.com/search?l=C%2B%2B&q=%22Py_NewInterpreter%28%29%22&type=Code
- https://github.com/search?l=C&q=%22Py_NewInterpreter%28%29%22&type=Code
For subinterpreter support a debugger would essentially do the same thing they already do for threads. For a debugger to support multiprocessing there's a much greater level of complexity involved.