Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy Imports #7179

Merged
merged 18 commits into from
Oct 28, 2022
Merged

Lazy Imports #7179

merged 18 commits into from
Oct 28, 2022

Conversation

headtr1ck
Copy link
Collaborator

@headtr1ck headtr1ck commented Oct 17, 2022

  • Hopefully Closes Long import time #6726
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@github-actions github-actions bot added io topic-backends topic-zarr Related to zarr storage library labels Oct 17, 2022
@headtr1ck headtr1ck added run-benchmark Run the ASV benchmark workflow and removed topic-backends topic-zarr Related to zarr storage library io labels Oct 17, 2022
@max-sixty
Copy link
Collaborator

Very nice @headtr1ck !

@github-actions github-actions bot added io topic-backends topic-zarr Related to zarr storage library and removed run-benchmark Run the ASV benchmark workflow labels Oct 18, 2022
@headtr1ck
Copy link
Collaborator Author

Not as much of an improvement as I thought, but shaving off another 20ms.

I can see if I can bring it down even further...

@max-sixty
Copy link
Collaborator

Do you think there's a risk something gets added back by mistake?

A galaxy brain idea is to have a lint in pre-commit which looks for ^import (foo|bar) for the libraries we should be lazy loading. Not fool proof. Or something to measure import time so we can see if it rises again.

@headtr1ck
Copy link
Collaborator Author

The simplest solution would be to add a simple unit test that checks that blacklisted packets are not in sys.modules.

For import timings we have asv, but that is not run so often.

@headtr1ck headtr1ck added the run-benchmark Run the ASV benchmark workflow label Oct 22, 2022
@github-actions github-actions bot removed the run-benchmark Run the ASV benchmark workflow label Oct 22, 2022
@headtr1ck headtr1ck changed the title Lazy Backends Lazy Imports Oct 22, 2022
@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Oct 22, 2022

Now I have made every module lazy that is not too much work, this includes:
matplotlib, flox and all backends.

The only remaining large import is dask.array, here I have to check if that is possible to make lazy?
This alone is 500ms import time, since it also imports numba, scipy and sparse... Maybe we could even open a ticket at dask about this topic as well?

@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Oct 22, 2022

Seems like it is but a good idea to mess with sys.modules in pytest. Anyone got an idea how to solve that?

Edit: solved :)

@headtr1ck headtr1ck added the run-benchmark Run the ASV benchmark workflow label Oct 22, 2022
@github-actions github-actions bot added topic-arrays related to flexible array support topic-dask topic-indexing and removed run-benchmark Run the ASV benchmark workflow labels Oct 22, 2022
@headtr1ck headtr1ck added the run-benchmark Run the ASV benchmark workflow label Oct 22, 2022
@headtr1ck
Copy link
Collaborator Author

I managed to get dask.array lazy as well, but dask.utils is still imported by backends.locks.

Someone else has to tackle this, I am not too familiar with this...

This got the import time quite low. I am happy with this now :)

@headtr1ck
Copy link
Collaborator Author

It seems that cfgrib has its own xarray_plugin now.
Should the internal version be deprecated?

Copy link
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I've felt the long import times recently and this will help greatly!

Maybe next step in a future PR is to try making pandas lazy as well? I don't think it's used as much as the top import in every file makes it out to be.

xarray/core/utils.py Outdated Show resolved Hide resolved
xarray/core/utils.py Outdated Show resolved Hide resolved
@headtr1ck headtr1ck added the plan to merge Final call for comments label Oct 23, 2022
@dcherian
Copy link
Contributor

Thanks @headtr1ck great PR!

@dcherian dcherian merged commit f32d354 into pydata:main Oct 28, 2022
@hmaarrfk
Copy link
Contributor

Exciting improvements on usability for the next version!

@dcherian
Copy link
Contributor

dcherian commented Nov 16, 2022

FYI I'm getting this error when trying to update the version in pyodide: https://app.circleci.com/pipelines/github/pyodide/pyodide/5016/workflows/e2cc0865-a91d-4cdb-8dbc-dfaeeb522c5f/jobs/60731

JavascriptException message:  Traceback (most recent call last):
  File "/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception
  File "/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/lib/python3.10/_pyodide/_base.py", line 531, in eval_code_async
    await CodeRunner(
  File "/lib/python3.10/_pyodide/_base.py", line 357, in run_async
    coroutine = eval(self.code, globals, locals)
  File "<exec>", line 1, in <module>
  File "/lib/python3.10/site-packages/xarray/__init__.py", line 1, in <module>
    from . import testing, tutorial
  File "/lib/python3.10/site-packages/xarray/tutorial.py", line 17, in <module>
    from .backends.api import open_dataset as _open_dataset
  File "/lib/python3.10/site-packages/xarray/backends/__init__.py", line 16, in <module>
    from .scipy_ import ScipyBackendEntrypoint, ScipyDataStore
  File "/lib/python3.10/site-packages/xarray/backends/scipy_.py", line 239, in <module>
    class ScipyBackendEntrypoint(BackendEntrypoint):
  File "/lib/python3.10/site-packages/xarray/backends/scipy_.py", line 260, in ScipyBackendEntrypoint
    available = module_available("scipy")
  File "/lib/python3.10/site-packages/xarray/core/utils.py", line 1009, in module_available
    return importlib.util.find_spec(module) is not None
  File "/lib/python3.10/importlib/util.py", line 103, in find_spec
    return _find_spec(fullname, parent_path)
  File "/lib/python3.10/_pyodide/_importhook.py", line 226, in find_spec
    raise ModuleNotFoundError(
ModuleNotFoundError: The module 'scipy' is included in the Pyodide distribution, but it is not installed. You can install it by calling: await micropip.install("scipy") in Python or await pyodide.loadPackage("scipy") in JavaScript. See https://pyodide.org/en/stable/usage/loading-packages.html for more details.

this seems like a bug in importlib

@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Nov 16, 2022

I don't think this is a bug in importlib but rather pyodide. They seem to add a hook to importlib (which is allowed, but probably causing this issue?)

Edit: The error message from pyodide seems more like this is intentional? Is scipy really installed correctly?

@dcherian
Copy link
Contributor

Turns out it is a pyodide bug with how they hook in to importlib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-arrays related to flexible array support topic-backends topic-dask topic-indexing topic-performance topic-plotting topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long import time
5 participants