Skip to content

SUB sockets take time to subscribe to the IOPub channel and miss important messages #593

@SylvainCorlay

Description

@SylvainCorlay

This occurs randomly in e.g.

  • JupyterLab when hitting the button to "restart the kernel, and re-run the whole notebook"

    iopub

    (On my setup roughly one time out of three). In the screenshot above, we see that only the IOPub stream messages from Cell 4 are received. Most typically, either all or non will be shown, but an intermediate result like that one also happens.

    Inspecting the messages on the websocket shows that the idle status message with the parent header corresponding to the shutdown_request is the first IOPub message, but it is sent after the first three stream messages corresponding to the first cells. (This has been reported here: Jupyter Lab stalls when choosing "Restart Kernel and Run All Cells..." jupyterlab/jupyterlab#9008)

  • Voilà, when running a dashboard

    In that case, it is a more rare occurrence. Basically, the idle status message for the kernel info request is never received because it is sent before jupyter_client has subscribed to the IOPub channel! Since the connect method in JupyterLab waits for that idle message to resolve the promise, it hangs indefinitely...

I am surprised that we did not hit this before and I wonder if this is a regression in jupyter_client or pyzmq, or if some changes in the relative timing exposed this race condition only recently.

cc @minrk @jtpio @JohanMabille @afshin

EDIT:

Independent fixes wherever we subscribe to channels. The approach is to nudge the kernel with info request until we get something on IOPub.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions