-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Presumably the next major (or shortly thereafter) release of nbformat will be python 3.6+ (or 3.7).
side note: I actually would suggest to drop only CPython 3.6, pypy 3.6 is still a fine, supported target!
Among the other benefits, this will make use of proper async and await possible, without relying on tornado to get cooperative behavior with downstreams like notebook and jupyter_server, or embedded in more exotic things.
Some places, which are all blocking today, that would likely benefit from being (apparently) asynchronous for certain workloads:
- loading JSON
- parsing JSON
- writing JSON
- validating JSON
Such workloads would include
- high-throughput use cases, such as
nbviewer- blocking behavior increases overall latency, which likely compounds unless the downstream uses their own pool (as
nbviewerdoes)
- blocking behavior increases overall latency, which likely compounds unless the downstream uses their own pool (as
- interactively, where a scientist who (even temporarily) generates notebooks with large/complex outputs and persist them to disk
- the combination of blocking io and validation cause significant apparent slowdowns, as other services (e.g. contents manager, etc) are starved (Long validations times, explore using fastjsonschema #190).
- This was discussed around validation
- the combination of blocking io and validation cause significant apparent slowdowns, as other services (e.g. contents manager, etc) are starved (Long validations times, explore using fastjsonschema #190).
Developer Experience
Typographically, it could be something like (assuming IPython-like top-level await for demonstration purposes):
param
import nbformat
nb = await nbformat.read('path/to/notebook.ipynb', as_version=4, async=True)
- this is kind of icky from a typing point of view...
proxy
or perhaps, with a full mirroring:
import nbformat.async_ as nbformat
nb = await nbformat.read('path/to/notebook.ipynb', as_version=4)prefix
just prefix with a
import nbformat
nb = await nbformat.aread('path/to/notebook.ipynb', as_version=4)Async facade/Configuration
While future asynchronous parsers/validators may arise, given that all of the above APIs are actually implemented as blocking behaviors right now, e.g. json.load or jsonschema.validate, an initial facade would probably be needed, perhaps as a ThreadPoolExecutor, etc. which might require configuration, e.g. NBCONVERT_THREADS.
Not even sure what a good default would be, but i'm always a fan of
multiprocessing.cpu_count.
Testing/Dependencies
- Given (py3.6|3.7)+, no new runtime dependencies should be required
- Adopting
pytest-asynciois very helpful in verifying async behavior - Following the lead of IPython/ipykernel, it should consider (and be tested against) alternate event loop implementations, e.g.
uvloop,trio - tracking the performance of the sync/async x loop x workload with
asvwould be very demonstrative
Downstreams
Of course, no downstreams would be ready to use this today! Some coordination might be necessary to determine any other gotchas that might arise, but this also means it can be released as an "experimental" API relatively quickly. As long as the original API remains, basically unchanged, however, there should be no rush-to-implement to catch a compatibility window, even once it is considered "supported".
cc @goanpeca