-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize handling of chunked array types #7019
Merged
Merged
Changes from 183 commits
Commits
Show all changes
189 commits
Select commit
Hold shift + click to select a range
15fc2b8
generalise chunk methods to allow cubed
TomNicholas 5e05b71
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cff89ee
fic typing typo
TomNicholas 039973b
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 60d44bc
fixed circular import
TomNicholas 5ddba7e
fix some mypy errors
TomNicholas 37d0d66
added cubed to mypy ignore list
TomNicholas 67d7efc
Merge branch 'main' into cubed_integration
TomNicholas cdcb3fb
simplify __array_ufunc__ check
TomNicholas 73e4563
Revert "simplify __array_ufunc__ check" as I pushed to wrong branch
TomNicholas 5995685
update cubed array type
TomNicholas 46223ae
Merge branch 'main' into cubed_integration
TomNicholas 320b09f
fix missed conflict
TomNicholas 3facfd6
sketch for ChunkManager adapter class
TomNicholas c616a85
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ecabaa4
Remove erroneous docstring about usage of map_blocks
TomNicholas e53a588
apply_ufunc -> apply_gufunc
TomNicholas fe21edd
chunk -> from_array
TomNicholas 3f6aedc
remove staticmethods
TomNicholas ea8f482
attempt to type methods of ABC
TomNicholas c49ab8e
from_array
TomNicholas 26d1868
attempt to specify types
TomNicholas e9b4a33
method for checking array type
TomNicholas 3a43b00
Merge branch 'main' into pr/7019
Illviljan c7c9589
Update pyproject.toml
Illviljan fc051e3
Merge branch 'main' into cubed_integration
TomNicholas 56e9d0f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3b16cca
fixed import errors
TomNicholas 7ac3323
generalize .chunk method kwargs
TomNicholas e732b87
used dask functions in dask chunkmanager
TomNicholas 68930eb
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 8442e1f
define signatures for apply_gufunc, blockwise, map_blocks
TomNicholas 3717431
prototype function to detect which parallel backend to use
TomNicholas 78d8969
Merge branch 'main' into cubed_integration
TomNicholas 7ac6531
add cubed.apply_gufunc
TomNicholas e423bfb
ruffify
TomNicholas 149db9d
add rechunk and compute methods for cubed
TomNicholas 280c563
xr.apply_ufunc now dispatches to chunkmanager.apply_gufunc
TomNicholas 42186e7
CubedManager.chunks
TomNicholas 103a755
attempt to keep dask and cubed imports lazy
TomNicholas f2bce3d
generalize idxmax
TomNicholas f09947d
move unify_chunks import to ChunkManager
TomNicholas e760f10
generalize Dataset.load()
TomNicholas b1a4e35
check explicitly for chunks attribute instead of hard-coding cubed
TomNicholas 5320f4d
better function names
TomNicholas 45ed5d2
add cubed version of unify_chunks
TomNicholas eec096b
recognize wrapped duck dask arrays (e.g. pint wrapping dask)
TomNicholas c64ff5f
add some tests for fetching ChunkManagers
TomNicholas 8a37905
add from_array_kwargs to open_dataset
TomNicholas 989d6bb
add from_array_kwargs to open_zarr
TomNicholas 8c7fe79
pipe constructors through chunkmanager
TomNicholas 0222b55
generalize map_blocks inside coding
TomNicholas 9d6cf6b
Merge branch 'main' into cubed_integration
TomNicholas afc6abc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2c0cc26
fixed full_like
TomNicholas 1a255cf
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas c398d98
add from_array_kwargs to open_zarr
TomNicholas 598bf12
don't import dask.tokenize
TomNicholas 7bef188
fix bugs with passing from_array_kwargs down
TomNicholas 7af5395
generalise reductions by adding to chunkmanager
TomNicholas 287e96c
moved nanfirst/nanlast to duck_array_ops from dask_array_ops
TomNicholas 8bbc141
generalize interp
TomNicholas 6cfe9fa
generalized chunk_hint function inside indexing
TomNicholas 4ca044b
DaskIndexingAdapter->ChunkedIndexingAdapter
TomNicholas 8ed5ed6
Merge branch 'main' into cubed_integration
TomNicholas 2a4c38b
Revert "DaskIndexingAdapter->ChunkedIndexingAdapter"
TomNicholas 45a4c98
pass cubed-related kwargs down through to_zarr by adding .store to Ch…
TomNicholas dee5b33
fix typing_extensions on py3.9
TomNicholas 176d7fa
fix ImportError with cubed array type
TomNicholas 9e58d6d
give up trying to import TypeAlias in CI
TomNicholas a6219a0
fix import of T_Chunks
TomNicholas 9f21994
fix no_implicit_optional warnings
TomNicholas eb7bb0b
don't define CubedManager if cubed can't be imported
TomNicholas 57733de
fix local mypy errors
TomNicholas 4c58b28
don't explicitly pass enforce_ndim into dask.array.map_blocks
TomNicholas d07830c
fix drop_axis default
TomNicholas c1bf040
Merge branch 'main' into cubed_integration
TomNicholas 3ae21d9
use indexing adapter on cubed arrays too
TomNicholas 7ef0129
use array API-compatible version of astype function
TomNicholas ec22963
whatsnew
TomNicholas 4c8d773
document new kwargs
TomNicholas f4de577
add chunkmanager entrypoint
TomNicholas 1cd7283
move CubedManager to a separate package
TomNicholas 5386711
guess chunkmanager based on whats available
TomNicholas 6b173de
Merge branch 'main' into cubed_integration
TomNicholas c431a5f
fix bug with tokenizing
TomNicholas 7ab9047
adapt tests to emulate existence of entrypoint
TomNicholas 72f8f5f
use fixture to setup/teardown dummy entrypoint
TomNicholas 34c6aea
refactor to make DaskManager unavailable if dask not installed
TomNicholas fb9466d
typing
TomNicholas ffd2e21
Merge branch 'main' into cubed_integration
TomNicholas 36b2be0
move whatsnew to latest xarray version
TomNicholas 77a1e4e
remove superfluous lines from whatsnew
TomNicholas a6222f9
fix bug where zarr backend attempted to use dask when not installed
TomNicholas 61fe236
Remove rogue print statement
TomNicholas 447d1f1
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas a7a6a6e
Clarify what's new
TomNicholas aa64996
use monkeypatch to mock registering of dummy chunkmanager
TomNicholas fec1a13
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas db11947
more tests for guessing chunkmanager correctly
TomNicholas 2c18df6
raise TypeError if no chunkmanager found for array types
TomNicholas 2e49154
Correct is_chunked_array check
TomNicholas 748e90d
vendor dask.array.core.normalize_chunks
TomNicholas 70804f4
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas dae2fe4
add default implementation of rechunk in ABC
TomNicholas 4ef500c
remove cubed-specific type check in daskmanager
TomNicholas ba66419
nanfirst->chunked_nanfirst
TomNicholas 7fd4617
revert adding cubed to NON_NUMPY_SUPPORTED_ARRAY_TYPES
TomNicholas 69d77c9
licensing to vendor functions from dask
TomNicholas 8337857
fix bug
TomNicholas 9850a46
ignore mypy error
TomNicholas 488fd5b
separate chunk_manager kwarg from from_array_kwargs dict
TomNicholas 00bcf6c
rename kwarg to chunked_array_type
TomNicholas ff1f8ab
Merge branch 'main' into cubed_integration
TomNicholas 844726d
refactor from_array_kwargs in .chunk ready for deprecation
TomNicholas 3d56a3d
print statements in test so I can comment on them
TomNicholas 1952c55
remove print statements now I've commented on them in PR
TomNicholas 3ba8d42
should fix dask naming tests
TomNicholas b15411c
Merge branch 'main' into cubed_integration
TomNicholas 53d6094
make dask-specific kwargs explicit in from_array
TomNicholas 7dc6581
debugging print statements
TomNicholas fcaf499
Revert "debugging print statements"
TomNicholas 64df7e8
fix gnarly bug with auto-determining chunksizes caused by not referri…
TomNicholas 747ada5
hopefully fix broken docstring
TomNicholas 9b33ab7
Revert "make dask-specific kwargs explicit in from_array"
TomNicholas c8d5aa1
Merge branch 'main' into cubed_integration
TomNicholas 6a7a043
show chunksize limit used in failing tests
TomNicholas 20f92c6
move lazy indexing adapter up out of chunkmanager code
TomNicholas 796a577
try upgrading minimum version of dask
TomNicholas 29d0c92
Revert "try upgrading minimum version of dask"
TomNicholas 031017b
un-vendor dask.array.core.normalize_chunks
TomNicholas a8c3413
Merge branch 'main' into cubed_integration
TomNicholas 14a1226
refactored to all passing ChunkManagerEntrypoint objects directly
TomNicholas 5dd9d35
Remove redundant Nones from types
TomNicholas 5a46294
From future import annotations
TomNicholas d6b56c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 471d22a
From functools import annotations
TomNicholas 8378f43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f8b1020
From future import annotations
TomNicholas 907c15b
Merge branch 'main' into cubed_integration
TomNicholas 11676ab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 76ce09e
defined type for NormalizedChunks
TomNicholas 127c184
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 604bbf3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7604594
standardized capitalization of ChunkManagerEntrypoint
TomNicholas 97537dd
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 026fe17
Merge branch 'main' into cubed_integration
TomNicholas 355555f
ensure ruff doesn't remove import
TomNicholas 6eac87a
ignore remaining typing errors stemming from unclear dask typing for …
TomNicholas f4224f6
Merge branch 'main' into cubed_integration
TomNicholas 4ec8370
rename store_kwargs->chunkmanager_store_kwargs
TomNicholas 316c63d
missed return value
TomNicholas 9cd9078
array API fixes for astype
TomNicholas 5dc2016
Revert "array API fixes for astype"
TomNicholas 995eb5a
Merge branch 'main' into cubed_integration
dcherian c8b9ee7
Apply suggestions from code review
Illviljan 5c95758
Update xarray/tests/test_parallelcompat.py
Illviljan a61a30a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ea35a32
overridden -> subclassed
TomNicholas e68b327
from_array_kwargs is optional
TomNicholas 956c055
ensured all compute calls go through chunkmanager
TomNicholas 42ad08e
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas cf0c28e
Raise if multiple chunkmanagers recognize array type
TomNicholas 4f2ec27
from_array_kwargs is optional
TomNicholas 5f2f569
from_array_kwargs is optional
TomNicholas 929db33
from_array_kwargs is optional
TomNicholas 876f81c
from_array_kwargs is optional
TomNicholas ad0a706
from_array_kwargs is optional
TomNicholas 115b52b
fixes for chunk methods
TomNicholas 8741eec
Merge branch 'main' into cubed_integration
TomNicholas a1ba4f0
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas bdf7600
correct readme to reflect fact we aren't vendoring dask in this PR an…
TomNicholas 06bb508
update whatsnew
TomNicholas ba00558
more docstring corrections
TomNicholas 6a99454
remove comment
TomNicholas 95d81e8
Raise NotImplementedErrors in all abstract methods
TomNicholas e5e3096
type hints for every arg in ChunkManagerEntryPOint methods
TomNicholas a221436
more explicit typing + fixes for mypy errors revealed
TomNicholas fe2e9b3
Keyword-only arguments in full_like etc.
TomNicholas 7bcaece
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] fecf7ed
None as default instead of {}
TomNicholas 15dc44b
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 660ef41
fix bug apparently introduced by changing default type of drop_axis k…
TomNicholas e6d6f1f
Removed hopefully-unnecessary mypy ignore
TomNicholas c7fbe79
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d728427
removed unnecessary mypy ignores
TomNicholas 6b9fa3f
Merge branch 'cubed_integration' of https://github.com/TomNicholas/xa…
TomNicholas 51db5f2
change default value of drop_axis kwarg in map_blocks and catch when …
TomNicholas c69c563
fix checking of dask version in map_blocks
TomNicholas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,7 @@ module = [ | |
"cf_units.*", | ||
"cfgrib.*", | ||
"cftime.*", | ||
"cubed.*", | ||
"cupy.*", | ||
"fsspec.*", | ||
"h5netcdf.*", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
) | ||
from xarray.backends.store import StoreBackendEntrypoint | ||
from xarray.core import indexing | ||
from xarray.core.parallelcompat import guess_chunkmanager | ||
from xarray.core.pycompat import integer_types | ||
from xarray.core.utils import ( | ||
FrozenDict, | ||
|
@@ -716,6 +717,8 @@ def open_zarr( | |
decode_timedelta=None, | ||
use_cftime=None, | ||
zarr_version=None, | ||
chunked_array_type: str | None = None, | ||
from_array_kwargs: dict[str, Any] | None = None, | ||
**kwargs, | ||
): | ||
"""Load and decode a dataset from a Zarr store. | ||
|
@@ -800,6 +803,15 @@ def open_zarr( | |
The desired zarr spec version to target (currently 2 or 3). The default | ||
of None will attempt to determine the zarr version from ``store`` when | ||
possible, otherwise defaulting to 2. | ||
chunked_array_type: str, optional | ||
Which chunked array type to coerce this datasets' arrays to. | ||
Defaults to 'dask' if installed, else whatever is registered via the `ChunkManagerEnetryPoint` system. | ||
Experimental API that should not be relied upon. | ||
from_array_kwargs: dict, optional | ||
Additional keyword arguments passed on to the `ChunkManagerEntrypoint.from_array` method used to create | ||
chunked arrays, via whichever chunk manager is specified through the `chunked_array_type` kwarg. | ||
Defaults to {'manager': 'dask'}, meaning additional kwargs will be passed eventually to | ||
:py:func:`dask.array.from_array`. Experimental API that should not be relied upon. | ||
Returns | ||
------- | ||
|
@@ -817,12 +829,17 @@ def open_zarr( | |
""" | ||
from xarray.backends.api import open_dataset | ||
|
||
if from_array_kwargs is None: | ||
from_array_kwargs = {} | ||
|
||
if chunks == "auto": | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we handle the import error in |
||
import dask.array # noqa | ||
guess_chunkmanager( | ||
chunked_array_type | ||
) # attempt to import that parallel backend | ||
|
||
chunks = {} | ||
except ImportError: | ||
except ValueError: | ||
chunks = None | ||
|
||
if kwargs: | ||
|
@@ -851,6 +868,8 @@ def open_zarr( | |
engine="zarr", | ||
chunks=chunks, | ||
drop_variables=drop_variables, | ||
chunked_array_type=chunked_array_type, | ||
from_array_kwargs=from_array_kwargs, | ||
backend_kwargs=backend_kwargs, | ||
decode_timedelta=decode_timedelta, | ||
use_cftime=use_cftime, | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
from_array_kwargs
here too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think we need to -
from_array_kwargs
is only going to get directly passed down toopen_dataset
, and hence could be considered part of**kwargs
.This should actually just work, except in the case of
parallel=True
. For that we could adddelayed
to theChunkManager
ABC, so that if cubed does implementcubed.delayed
it could be added, else aNotImplementedError
would be raised. I think all of this wouldn't be necessary if we had lazy concatenation in xarray though (xref #4628). That suggestion would mean we should also replace other instances ofdask.delayed
in other parts of the codebase though... I think I will split this into a separate issue in the interests of getting this one merged.