-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize handling of chunked array types #7019
Conversation
for more information, see https://pre-commit.ci
…rray into cubed_integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This reverts commit cdcb3fb.
…warg to map_blocks
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
for more information, see https://pre-commit.ci
…rray into cubed_integration
@Illviljan thanks for all your comments! Would you (or @keewis?) be willing to approve this PR now? I would really like to merge this so that I can release a version of xarray that I can use as a dependency for cubed-xarray. |
Thanks @TomNicholas Big change! |
Woooo thanks @dcherian ! |
👏 Congrats @TomNicholas on getting this in! Such an important contribution. 👏 |
Thanks for all your hard work on this @TomNicholas! |
* generalise chunk methods to allow cubed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fic typing typo * fixed circular import * fix some mypy errors * added cubed to mypy ignore list * simplify __array_ufunc__ check * Revert "simplify __array_ufunc__ check" as I pushed to wrong branch This reverts commit cdcb3fb. * update cubed array type * fix missed conflict * sketch for ChunkManager adapter class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove erroneous docstring about usage of map_blocks Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * apply_ufunc -> apply_gufunc Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * chunk -> from_array Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * remove staticmethods * attempt to type methods of ABC * from_array * attempt to specify types * method for checking array type * Update pyproject.toml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed import errors * generalize .chunk method kwargs * used dask functions in dask chunkmanager * define signatures for apply_gufunc, blockwise, map_blocks * prototype function to detect which parallel backend to use * add cubed.apply_gufunc * ruffify * add rechunk and compute methods for cubed * xr.apply_ufunc now dispatches to chunkmanager.apply_gufunc * CubedManager.chunks * attempt to keep dask and cubed imports lazy * generalize idxmax * move unify_chunks import to ChunkManager * generalize Dataset.load() * check explicitly for chunks attribute instead of hard-coding cubed * better function names * add cubed version of unify_chunks * recognize wrapped duck dask arrays (e.g. pint wrapping dask) * add some tests for fetching ChunkManagers * add from_array_kwargs to open_dataset * add from_array_kwargs to open_zarr * pipe constructors through chunkmanager * generalize map_blocks inside coding * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed full_like * add from_array_kwargs to open_zarr * don't import dask.tokenize * fix bugs with passing from_array_kwargs down * generalise reductions by adding to chunkmanager * moved nanfirst/nanlast to duck_array_ops from dask_array_ops * generalize interp * generalized chunk_hint function inside indexing * DaskIndexingAdapter->ChunkedIndexingAdapter * Revert "DaskIndexingAdapter->ChunkedIndexingAdapter" This reverts commit 4ca044b. * pass cubed-related kwargs down through to_zarr by adding .store to ChunkManager * fix typing_extensions on py3.9 * fix ImportError with cubed array type * give up trying to import TypeAlias in CI * fix import of T_Chunks * fix no_implicit_optional warnings * don't define CubedManager if cubed can't be imported * fix local mypy errors * don't explicitly pass enforce_ndim into dask.array.map_blocks * fix drop_axis default * use indexing adapter on cubed arrays too * use array API-compatible version of astype function * whatsnew * document new kwargs * add chunkmanager entrypoint * move CubedManager to a separate package * guess chunkmanager based on whats available * fix bug with tokenizing * adapt tests to emulate existence of entrypoint * use fixture to setup/teardown dummy entrypoint * refactor to make DaskManager unavailable if dask not installed * typing * move whatsnew to latest xarray version * remove superfluous lines from whatsnew * fix bug where zarr backend attempted to use dask when not installed * Remove rogue print statement Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * Clarify what's new Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * use monkeypatch to mock registering of dummy chunkmanager * more tests for guessing chunkmanager correctly * raise TypeError if no chunkmanager found for array types * Correct is_chunked_array check Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> * vendor dask.array.core.normalize_chunks * add default implementation of rechunk in ABC * remove cubed-specific type check in daskmanager * nanfirst->chunked_nanfirst * revert adding cubed to NON_NUMPY_SUPPORTED_ARRAY_TYPES * licensing to vendor functions from dask * fix bug * ignore mypy error * separate chunk_manager kwarg from from_array_kwargs dict * rename kwarg to chunked_array_type * refactor from_array_kwargs in .chunk ready for deprecation * print statements in test so I can comment on them * remove print statements now I've commented on them in PR * should fix dask naming tests * make dask-specific kwargs explicit in from_array * debugging print statements * Revert "debugging print statements" This reverts commit 7dc6581. * fix gnarly bug with auto-determining chunksizes caused by not referring to dask.config * hopefully fix broken docstring * Revert "make dask-specific kwargs explicit in from_array" This reverts commit 53d6094. * show chunksize limit used in failing tests * move lazy indexing adapter up out of chunkmanager code * try upgrading minimum version of dask * Revert "try upgrading minimum version of dask" This reverts commit 796a577. * un-vendor dask.array.core.normalize_chunks * refactored to all passing ChunkManagerEntrypoint objects directly * Remove redundant Nones from types Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * From future import annotations Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * From functools import annotations Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * From future import annotations Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * defined type for NormalizedChunks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * standardized capitalization of ChunkManagerEntrypoint * ensure ruff doesn't remove import * ignore remaining typing errors stemming from unclear dask typing for chunks arguments * rename store_kwargs->chunkmanager_store_kwargs * missed return value * array API fixes for astype * Revert "array API fixes for astype" This reverts commit 9cd9078. * Apply suggestions from code review * Update xarray/tests/test_parallelcompat.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * overridden -> subclassed Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * ensured all compute calls go through chunkmanager * Raise if multiple chunkmanagers recognize array type Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * from_array_kwargs is optional Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * fixes for chunk methods * correct readme to reflect fact we aren't vendoring dask in this PR any more * update whatsnew * more docstring corrections * remove comment * Raise NotImplementedErrors in all abstract methods Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * type hints for every arg in ChunkManagerEntryPOint methods * more explicit typing + fixes for mypy errors revealed * Keyword-only arguments in full_like etc. Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * None as default instead of {} * fix bug apparently introduced by changing default type of drop_axis kwarg to map_blocks * Removed hopefully-unnecessary mypy ignore Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unnecessary mypy ignores * change default value of drop_axis kwarg in map_blocks and catch when dask version < 2022.9.1 * fix checking of dask version in map_blocks --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
Initial attempt to get cubed working within xarray, as an alternative to dask.
whats-new.rst
api.rst
I've added a
manager
kwarg to the.chunk
methods so you can doda.chunk(manager="cubed")
to convert to a chunkedcubed.CoreArray
, with the default still beingda.chunk(manager="dask")
. (I couldn't think of a better name than "manager", as "backend" and "executor" are already taken.)At the moment it should work except for an import error that I don't understand, see below.Fro cubed to work at all with this PR we would also need:
Array
vscubed.CoreArray
cubed-dev/cubed#123apply_gufunc
[WIP] apply_gufunc cubed-dev/cubed#119 - implemented in Addapply_gufunc
cubed-dev/cubed#149 🥳To-dos for me on this PR:
xarray.apply_ufunc
throughcubed.apply_gufunc
instead of dask'sapply_gufunc
when appropriate,from_array_kwargs
to opening functions, e.g.open_zarr
, andopen_dataset
,from_array_kwargs
to creation functions, such asfull_like
,store_kwargs
as a way to propagate cubed-specific kwargs when savingto_zarr
.To complete this project more generally we should also:
cubed.apply_gufunc
support multiple output arguments Support multiple outputs inapply_gufunc
cubed-dev/cubed#152cubed.unify_chunks
to matchdask.array.core.unify_chunks
xarray.map_blocks
to work on cubed arrays, ideally by first rewriting xarray's implementation ofmap_blocks
to usedask.array.map_blocks
cc @tomwhite