-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support chunks
in open_groups
and open_datatree
#9660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was quick @keewis !
Thanks for doing this @keewis! |
xarray/backends/api.py
Outdated
} | ||
) | ||
|
||
# ds.set_close(backend_ds._close) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend_tree
should have been created using datatree_from_dict_with_io_cleanup
, so one way to handle this could be just to copy over the _close
attribute from every node of backend_tree
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question is, do we even need that here? I copied this from open_dataset
where this is explicitly set, but since datatree_from_dict_with_io_cleanup
does this already we might be able to just remove it?
The only reason why I kept the commented-out line is to discuss whether the shift in paradigm (have the backend set _close
vs. do it for all backends the same way) is intentional, and if we should do the same for open_dataset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be nice to remove this, I'm just worried that mapping over the each .dataset
might not properly propagate ._close
(does it? should it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not (I think), so I'm explicitly copying it over. So far that doesn't appear to cause anything to break.
I've copied over the docstring of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Co-authored-by: Tom Nicholas <tom@cworthy.org>
Co-authored-by: Tom Nicholas <tom@cworthy.org>
Co-authored-by: Tom Nicholas <TomNicholas@users.noreply.github.com>
I'm happy to merge this @keewis ? |
me too! Feel free to go ahead and merge. |
* adding draft for fixing behaviour for group parameter * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * new trial * new trial * fixing duplicate pahts and path in the root group * removing yield str(gpath) * implementing the proposed solution to hdf5 and netcdf backends * adding changes to whats-new.rst * removing encoding['source_group'] line to avoid conflicts with PR #9660 * adding test * adding test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding assert subgroup_tree.root.parent is None * modifying tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update xarray/tests/test_backends_datatree.py Co-authored-by: Justus Magin <keewis@users.noreply.github.com> * applying suggested changes * updating test * adding Justus and Alfonso to the list of contributors to the DataTree entry * adding Justus and Alfonso to the list of contributors to the DataTree entry --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tom Nicholas <tom@cworthy.org> Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...
In trying to support
chunks
the way we do foropen_dataset
I had to add a lot of parameters to the top-levelopen_datatree
andopen_groups
functions.I'm also still looking for the equivalent of
_protect_dataset_variables_inplace
, and finally_dataset_from_backend_dataset
has been the place to callset_close
, while #9651 pushed this to the backends for datatree.No tests yet, and I also want to improve the docstrings before merging.
(the chunked array methods –
chunk
,load
,compute
, andpersist
– should be a separate PR)cc @TomNicholas, @shoyer