-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add mode="r+" for to_zarr and use consolidated writes/reads by default #5252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`mode="r+"` only allows for modifying pre-existing array values in a Zarr store. This makes it a safer default `mode` when doing a limited `region` write. It also offers a nice performance bonus when using consolidated metadata, because the store to modify can be opened in "consolidated" mode. This PR includes several related changes to `to_zarr()`: 1. It adds support for the new `mode="r+"`. 2. `consolidated=True` in `to_zarr()` now means "open in consolidated mode" if using using `mode="r+"`, instead of "write in consolidated mode" (which would not make sense for r+). 3. It allows setting `consolidated=True` when using `region`, mostly for the sake of fast store opening with r+. 4. Validation in `to_zarr()` has been reorganized to always use the _existing_ Zarr group, rather than re-opening zar stores from scratch, which could require additional network requests. 5. Incidentally, I've renamed the `ZarrStore.ds` attribute to `ZarrStore.zarr_group`, which is a much more descriptive name. These changes gave me a ~5x boost in write performance in a large parallel job making use of `to_zarr` with `region`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
Question: does this mode still require eager loading of dimension coordinates? |
Nope, not anymore! We decide the existing variables (to figure out/match
their encoding) but that's done in a lazy way. We never put them in an
xarray.Dataset so there is no eager loading of dimension coordinates.
…On Tue, May 4, 2021 at 7:07 AM Ryan Abernathey ***@***.***> wrote:
Question: does this mode still require eager loading of dimension
coordinates?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5252 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVVIPHXHCNFR2GJXJKDTL75R5ANCNFSM44ALL2AQ>
.
|
I pushed a commit making consolidated metadata the default (see #5251). This was easier to do building on top of my existing refactor here (but conceivably could be separated). |
Thanks @joshmoore for the copy-editing! |
@rabernat gentle ping for review here :) |
@nbren12 @spencerkclark are either of you up for a review here? |
I vote to merge since it's so useful and @shoyer has probably found all the bugs already :) |
Hah :). We've been using this branch at Google for a while now and it seems to be working pretty well. |
best kind of code review! Thanks @shoyer |
Really sorry I didn't get around to review. My excuse is that I moved back to NYC last week and fell behind on everything. Thanks for moving it forward. 💪 |
I'm sorry too! I don't have any good excuse though... |
Well hey, it's still not too late to catch any issues before they turn up
in a released version of Xarray :)
…On Mon, Jun 21, 2021 at 11:56 PM Noah D. Brenowitz ***@***.***> wrote:
I'm sorry too! I don't have any good excuse though...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5252 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVXBEQCIEIUDKLD6NBLTUAJXLANCNFSM44ALL2AQ>
.
|
mode="r+"
only allows for modifying pre-existing array values in aZarr store. This makes it a safer default
mode
when doing a limitedregion
write. It also offers a nice performance bonus when usingconsolidated metadata, because the store to modify can be opened in
"consolidated" mode -- rather than painfully slow non-consolidated
mode.
This PR includes several related changes to
to_zarr()
:mode="r+"
.consolidated=True
into_zarr()
now means "open in consolidatedmode" if using using
mode="r+"
, instead of "write in consolidatedmode" (which would not make sense for r+).
consolidated=True
when usingregion
, mostlyfor the sake of fast store opening with r+.
to_zarr()
has been reorganized to always use theexisting Zarr group, rather than re-opening zar stores from
scratch, which could require additional network requests.
ZarrStore.ds
attribute toZarrStore.zarr_group
, which is a much more descriptive name.These changes gave me a ~5x boost in write performance in a large
parallel job making use of
to_zarr
withregion
.pre-commit run --all-files
whats-new.rst