Description
Restore Notebook execution progress when a browser page is reload
Problem
Jupyter Notebook/Lab does not restore execution progress after the page is reloaded. As a result there is no option to monitor execution progress and retrieve execution output for long running notebooks.
It happens due to the following reasons:
-
Notebook/Lab UI generates a new session id every time when a notebook/lab page is reloaded.
Jupyter Server supports replaying kernel messages to the client after the kernel session is reconnected. It is based on the session id set by the client while connecting to the kernel. Jupyter Notebook/Lab UI generates a new session id every time when the notebook page is loaded. So there is no way to replay buffered messages from Jupyter Server after the notebook page was re-opened because there is no way to reconnect to the existing kernel session.
-
Message IDs for submitted code are not persisted in cells metadata
Kernel message ids are not persisted in the notebook metadata and cleaned up after the notebook window or tab is reloaded. It means that Jupyter Frontend code is not able to link kernel messages to cells. As a result there is no way to show the output and an execution count.
-
Unsaved cell’s output is missing after a notebook page is reloaded
If the kernel message with the execution result is sent to the browser but the notebook is not saved then output will not be displayed after reloading a page
Proposed Solution
A proposal is to enable restoring sessions for kernel connections and to move a Notebook model (cells metadata) to Jupyter Server and synchronize changes triggered by a Notebooks/Lab UI and a kernel.
Restore kernel sessions
Make a kernel connection independent from the session id provided in web socket session_id
url argument. A new session id is generated every time when the notebook is reloaded. In this case buffered kernel messages will not be replayed after the Notebook page/tab is re-opened. JupyterLab and Notebook 7 support collaboration mode which allows you to differentiate between notebook users. A user info and a notebook path could be used as a session identifier and will be mapped to the kernel id.
Move a Notebook model (cells metadata) to Jupyter Server and synchronize it with a kernel and a notebook opened in UI
Storing a notebook model on Jupyter Server will allow:
- restore message ids for all code cells with submitted code after the page is reloaded;
- Restore execution progress and output for unsaved changes;
There are implementation notes for enabling Notebooks model on Jupyter Server:
- JupyterLab/Notebook UI sends messages when the following cell's state is changed:
- On Cell Changed
- On Cell Inserted
- On Cell Deleted
- On Cell Cleared
- On Cell Executed
The messages are sent via kernel's web socket connection with a new message type (e.g.nb_state
). ThenZMQChannelHandler
parses incoming messages and forwards notebook state related messages toNotebooksStatesManager
.NotebooksStatesManager
is responsible for synchronizing a notebook state between UI and Jupyter Server. It will also handle kernel messages.
- When the code is submitted for execution a message id (
msg_id
) is returned to the client (Browser). The client tracks an execution progress based on the message id. Currently the message id is stored in the browser and should not be persisted in the notebook ipynb file because it is relevant only during a runtime. Since each cell has a unique id then it is possible to map message id to cell id, store a message id for each submitted cell on the Jupyter Server and return it to the client when the notebook is reloaded. Then the client will be able to handle incoming kernel messages and display execution progress. Message id is not saved in the ipynb file and is available only during a runtime. - When the notebook is reloaded then cells metadata (including output) from the ipynb file will be merged with the cells metadata from the notebook model saved on the Jupyter Server.
Additional context
The image below shows components and data flow for execution restore logic:
- When the notebook is loaded for a first time
ContentsManager
creates a copy of a Notebook model on Jupyter Server and sends it to the client (Notebook/Lab UI) - When a user edits the notebook then changes are sent to the Jupyter Server via kernel's web socket connection.
ZMQChannelsHandler
parses messages by the type (e.g.nb_state
type) and forwards messages related to state changes to theNotebooksStatesManager
.NotebooksStatesManager
updates the notebook model stored on the server - When the kernel sends the message
ZMQchannelsHandler
forwards message to theNotebooksStatesManager
and to the Jupyter Notebook/Lab UI - When the notebook page is reloaded
ContentsManager
loads a notebook file from the storage (file system, cloud storage, etc.) and merges it (including message ids for submitted code cells execution) with the notebook model stored on Notebook Server. It allows to identify which cells are in executing state and restore execution progress. - When the user saves the notebook then contents manager removes
message ids
from the notebook model which should be saved in the file and saves ipynb file in the storage (msg_id
parameter still exists in the Notebook model stored on Jupyter Server).