Skip to content

nbformat asynchronous API? #194

@bollwyvl

Description

@bollwyvl

Presumably the next major (or shortly thereafter) release of nbformat will be python 3.6+ (or 3.7).

side note: I actually would suggest to drop only CPython 3.6, pypy 3.6 is still a fine, supported target!

Among the other benefits, this will make use of proper async and await possible, without relying on tornado to get cooperative behavior with downstreams like notebook and jupyter_server, or embedded in more exotic things.

Some places, which are all blocking today, that would likely benefit from being (apparently) asynchronous for certain workloads:

  • loading JSON
  • parsing JSON
  • writing JSON
  • validating JSON

Such workloads would include

  • high-throughput use cases, such as nbviewer
    • blocking behavior increases overall latency, which likely compounds unless the downstream uses their own pool (as nbviewer does)
  • interactively, where a scientist who (even temporarily) generates notebooks with large/complex outputs and persist them to disk

Developer Experience

Typographically, it could be something like (assuming IPython-like top-level await for demonstration purposes):

param

import nbformat
nb = await nbformat.read('path/to/notebook.ipynb', as_version=4, async=True)
  • this is kind of icky from a typing point of view...

proxy

or perhaps, with a full mirroring:

import nbformat.async_ as nbformat
nb = await nbformat.read('path/to/notebook.ipynb', as_version=4)

prefix

just prefix with a

import nbformat
nb = await nbformat.aread('path/to/notebook.ipynb', as_version=4)

Async facade/Configuration

While future asynchronous parsers/validators may arise, given that all of the above APIs are actually implemented as blocking behaviors right now, e.g. json.load or jsonschema.validate, an initial facade would probably be needed, perhaps as a ThreadPoolExecutor, etc. which might require configuration, e.g. NBCONVERT_THREADS.

Not even sure what a good default would be, but i'm always a fan of multiprocessing.cpu_count.

Testing/Dependencies

  • Given (py3.6|3.7)+, no new runtime dependencies should be required
  • Adopting pytest-asyncio is very helpful in verifying async behavior
  • Following the lead of IPython/ipykernel, it should consider (and be tested against) alternate event loop implementations, e.g. uvloop, trio
  • tracking the performance of the sync/async x loop x workload with asv would be very demonstrative

Downstreams

Of course, no downstreams would be ready to use this today! Some coordination might be necessary to determine any other gotchas that might arise, but this also means it can be released as an "experimental" API relatively quickly. As long as the original API remains, basically unchanged, however, there should be no rush-to-implement to catch a compatibility window, even once it is considered "supported".

cc @goanpeca

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions