-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement dask
methods on DataTree
#9670
Conversation
data = self._to_dataset_view(rebuild_dims=False, inherit=inherit)._copy( | ||
deep=deep, memo=memo | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
we're still missing an implementation for |
Co-authored-by: Tom Nicholas <tom@cworthy.org>
xarray/core/datatree.py
Outdated
{ | ||
dim: size | ||
for dim, size in combined_chunks.items() | ||
if dim in node.dataset.dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to use node.dataset.dims
to avoid including inherited dims (which we can't chunk anyways because we only inherit indexed dims, and there is no chunked index so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise I just saw _node_dims
in _get_all_dims
, so that might have less overhead. I'll go ahead and use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will also be more explicit as it will avoid using "rebuilt" dims (which doesn't really matter anyway because we can't chunk indexes).
I'm happy to help, but I think that (a) |
that would be fine with me. That leaves us with the question whether we should guard against |
We should, see #9670 (comment) |
yeah, I saw that one after posting. Should be done now, though. |
xarray/core/datatree.py
Outdated
Mapping from group paths to a mapping of dimension names to block lengths for this datatree's data, or None if | ||
the underlying data is not a dask array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this docstring (and the one for .chunksizes
on Dataset
etc.) are wrong because this method will never return a None
.
actually, I just tried to compute a |
Tell me what you think about the reworded docstrings of |
Looks good! Let's merge. |
(I just noticed that |
|
should be good now, hopefully |
* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...
For now this only contains
compute
andload
, but in implementing those I've found a bug incopy
: we don't actually (shallow) copy the variables, which means that my implementation ofcompute
would modify the original tree.chunks
inopen_groups
andopen_datatree
#9660