Skip to content

Improving Network Performance #312

@mlucool

Description

@mlucool

How can we optimally transfer assets to Jupyter clients (web browsers)?

Hypothesis: HTTP2 (i.e. no head of line blocking) and compression would meaningful improve page load and large notebook load performance.

Experiment: Create an nginx config that adds in ssl/http 2/compression and use it as a simple reverse proxy in front of a jupyterlab 2.x server. Then use chrome dev tools to understand changes to performance. In this setup my server and browser are not in the same physical location, but are connected by a high speed network. I had exactly one location block so static assets were still came via tornado.

Conclusion: Surprisingly, these technologies when naively put on top of a jupyterlab@2.x server did not make a meaningful difference. The reverse proxy decreased the size of small assets, but increased the time for page load by ~10%. For large assets they clearly shrunk their size by a large amount 10x-23,000x (the latter is a generated very compressible test notebook) but the time to compress these on the fly meant there were minimal gains to be had. The ~10mb vendors bundle I had was compressed to 2.5Mb but took longer to get to the browser. A 33mb notebook shrank to 1.6kb still took about 30s either way. I'll note, most of my notebooks are small (<5MB).

I ran a second experiment where I put the notebook directly behind the same nginx server. In this case I was able to download the 33MB notebook in ~100ms!

In my view this experiment points to some large gains that can be had by letting assets skip the python server or thinning out the code path between the two. A few suggestions:

  1. Create a document for how to configure nginx/apache in front of Jupyter Server. This document should include tips of the right settings to skip the jupyter server for certain assets (e.g. anything in /static).
  2. If an asset is in /static jupyter should treat it as such as set the right headers (today I see no-cache set for example). Doing 1 should help people automatically do this, but doing this in jupyter_server may be useful for the average case.
  3. Dig into where the 30s is going when sending a large notebook. My guess is we'll need to skip over some steps in python or preload them in memory, but that's just a hunch and we'll need more data.

Pictures are worth 1000 words:
No optimizations page load:
image

Nginx page load:
image

No optimizations large notebook:
image

Nginx large notebook:
image

Directly sending the notebook (renamed to foo.json):
image

cc @goanpeca

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions