Description
The current jupyter_server
project started as a split of the backend parts of the notebook
repository, and the classic notebook front-end is now installable as a separate package providing a server extension https://github.com/jupyterlab/nbclassic, and JupyterLab has adopted the new package.
While jupyter_server has changed quite a bit from the original notebook backend, it still includes a lot of the history of the original project.
Problems with the current server
Tornado
IIRC, Tornado was one of the earliest Python web servers to support WebSockets, and provided a modern async programming model long before asyncio existed. It was adopted broadly in the Jupyter stack, so much that e.g. ipykernel and jupyter_client depend on Tornado…
However, in my opinion, Tornado has become a liability:
- The async constructs are a compatibility layer on top of asyncio. However, there are some tricky corner cases that make it not great to work with.
- Several of the recent versions of Tornado broke Jupyter in subtle ways, and we've had to patch various Jupyter subprojects to accommodate new versions.
- It is not so actively developed compared to alternatives.
Dropping Tornado and building a new server on top of another stack would be a complete reboot of the project - and would not allow any existing server extension to be used with it.
The current HTTP and WebSocket APIs
HTTP endpoints
The current HTTP endpoints could be improved in several ways. For example, we could work on
- providing pagination for endpoints returning large amounts of data (such as the content API in directories listing too many files).
- providing hooks in the content API so that we can handle e.g. lazy loading of large files instead of always sending the full content in the front-end.
- breaking the coupling and the tight assumptions that are made between certain HTTP endpoints.
The API could also handle certain long-running requests differently, for example by returning immediately with a token that can be used to poll another endpoint for the result.
The kernel protocol over WebSocket is inefficient
Another issue with the way the Jupyter server works is due to the way we communicate with kernels over WebSockets. The main issue in my opinion, while ZMQ messages are serialised in a well-specified sequence of blobs of bytes,
[
b'u-u-i-d', # zmq identity(ies)
b'<IDS|MSG>', # delimiter
b'baddad42', # HMAC signature
b'{header}', # serialized header dict
b'{parent_header}', # serialized parent header dict
b'{metadata}', # serialized metadata dict
b'{content}', # serialized content dict
b'\xf0\x9f\x90\xb1' # extra raw data buffer(s)
...
]
the WebSocket protocol communicates this content as a JSON object with keys for header
, content
, metadata
etc. A consequence of that design is that all messages have to be parsed so that we can recompose the ZMQ messages.
If the WebSocket messages contained the same binary blobs as the ZMQ messages, we could directly route them to the right kernel (simply adding ZMQ identities and delimiter)... Such an approach would result in a considerably faster handling of kernel messages.
A multi-user "single instance" server
In cloud deployments, (especially with the new RTC features of JupyterLab), we will probably want have some preferences (currently configured via traitlets configurables) to become user-specific and be saved in a data base.
For examples, themes, workspaces in JupyterLab should probably be set on a per-user basis.
A proposal for a new server
Drop Tornado and reboot the Jupyter server project with a FastAPI-based solution.
Using FastAPI will come with many benefits such as modern tooling (type annotation, automatic generation of OpenAPI specs, a rich collection of tools for telemetry, authentication.
Adopt the “everything is a plugin” approach of JupyterLab to the architecture
Starting from an "empty" base server and a collection of plugins for HTTP endpoints may have important benefits compared to the current approach where the base server provides a number of endpoints already.
- we could more easily provide alternative implementations of standard endpoints.
- we could make "remixes" of the base plugins, cherry picking some endpoints from the core server and others from third-party plugins. An example use case could be a "kernels-only" server that would only provide the end-points for communicating kernels. This could be an interesting way to deal with remote servers (possibly by enabling other plugins for e.g. LSP etc).
Prototype implementation
In the past few weeks, @adriendelsalle and @davidbrochart have been working on prototyping such an approach
- https://github.com/adriendelsalle/fps provides the base server and a simple plugin system
- https://github.com/davidbrochart/jupyverse provides a collection of fps plugins for the main Jupyter functionalities. It can serve JupyterLab, retrolab, and other Jupyter front-ends.
There is still a lot to figure out naturally:
-
We've had several conversation on whether we would like e.g. the base server to always require a database, and plugins to require some tables etc in this database. For plugins having special requirements, they could always require another one...
Presumably, SQLite could be used for the case of a single machine deployment of Jupyter where the users simply types
jupyter lab
to launch it, but a database running on a separate machine would presumably be specified in the case of cloud deployments. (Using the same database for several plugins would help simplify the configuration). -
When discussing the project, we have also been thinking about the articulation between the single-instance server and the hub with respect to authentication and authorization. One idea that came out was to "elect" OIDC as the default authentication method, and use the hub as an OIDC identity provider in the case of hub-based deployments.
-
And naturally the whole question of the transition should we move forward.
Jupyter_server
We would like to discuss those ideas with the broader jupyter_server
community, and improve the proposal and the ongoing work based on the group's feedback.
Obviously, we should continue improving jupyter_server
, but there are many projects that could already benefit from the proposed approach.