Skip to content

'plotly' package contains 123MB of autogenerated code #3294

Open

Description

Thank you for plotly.py, it's definitely worked well for us in our app!

We're deploying our app's backend to AWS lambda, packaging dependencies in a "layer" which has a 256MB size limit. We are hitting this limit. Unfortunately, plotly's Python library is huge: for the version we're using there (4.14.3), it ends up being 58MB of Python source, and ~19MB of JavaScript (plotly.min.js, and then the Jupyter plugin). The python source seems to be almost entirely the auto-generated (AIUI) graph_objs and validators subdirectories. To reduce size, we've removed the JavaScript files, because the lambdas don't use any of that, however that still leaves the significant amount of Python code.

To make this more concrete, here's the numbers for the latest version on my Mac:

$ pip install plotly==5.1.0
...
$ pip show plotly
...
Location: /SOME/PATH/.../site-packages
...
$ cd /SOME/PATH/.../site-packages # copied from the command above
$ du -sch plotly/* | sort -h
4.0K	plotly/_version.py
4.0K	plotly/_widget_version.py
4.0K	plotly/animation.py
4.0K	plotly/config.py
4.0K	plotly/conftest.py
4.0K	plotly/dashboard_objs.py
4.0K	plotly/exceptions.py
4.0K	plotly/files.py
4.0K	plotly/grid_objs.py
4.0K	plotly/missing_ipywidgets.py
4.0K	plotly/optional_imports.py
4.0K	plotly/presentation_objs.py
4.0K	plotly/serializers.py
4.0K	plotly/session.py
4.0K	plotly/validator_cache.py
4.0K	plotly/version.py
4.0K	plotly/widgets.py
8.0K	plotly/__init__.py
8.0K	plotly/callbacks.py
8.0K	plotly/colors
8.0K	plotly/utils.py
 12K	plotly/shapeannotation.py
 16K	plotly/data
 16K	plotly/plotly
 24K	plotly/graph_objects
 28K	plotly/tools.py
 36K	plotly/basewidget.py
 52K	plotly/subplots.py
 76K	plotly/offline
220K	plotly/basedatatypes.py
264K	plotly/matplotlylib
340K	plotly/__pycache__
344K	plotly/express
364K	plotly/io
664K	plotly/figure_factory
3.5M	plotly/package_data
 43M	plotly/graph_objs
 80M	plotly/validators
129M	total

That is, 123MiB/129MiB (95%) of the package size is the autogenerated graph_objs and validators submodules.

Since these are autogenerated, potentially they could be autogenerated in a way that makes them significantly smaller without changing behaviour or structure. Some ideas:

  • reduce unnecessary whitespace, like empty lines, and, particularly, leading whitespace in doc strings (and potentially other multiline strings) or indentation (one space is enough, rather than 4)
  • other minification techniques, like those supported by https://pypi.org/project/python-minifier

These will require disabling black and generally make the files harder to read, but I don't think they're designed to be human readable anyway?

(There's also other possibilities like combining multiple files into one, allowing sharing imports, but this is probably only a small win, and will require changing other code.)

For example, starting with https://github.com/plotly/plotly.py/blob/v5.1.0/packages/python/plotly/plotly/graph_objs/bar/_stream.py one could save ~20%: https://gist.github.com/huonw/4b81b6825ebd508bbcd39f4bb2215f4e

state size (bytes) relative size
original 4104 100%
no leading whitespace in doc-strings 3792 92%
no empty lines or lines with only # ---- comments 3522 86%
1 space indent 3201 78%

Assuming this 20% decrease generalises across all the autogenerated files, this would cut nearly 25MB off the 129M package.

(Thanks again for plotly!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    P3backlogbugsomething brokeninfrastructurebuild process etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions