-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicate Zarr array read #8472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logical next step after #8297.
xarray/backends/zarr.py
Outdated
@@ -63,13 +63,13 @@ def encode_zarr_attr_value(value): | |||
class ZarrArrayWrapper(BackendArray): | |||
__slots__ = ("datastore", "dtype", "shape", "variable_name", "_array") | |||
|
|||
def __init__(self, variable_name, datastore): | |||
def __init__(self, variable_name, zarr_array, datastore): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, variable_name, zarr_array, datastore): | |
def __init__(self, zarr_array): |
Fairly sure this should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test this properly. I've not had good results doing this trick on a different datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the heads up! what issues did you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure how the zarr backend is implemented, but in my case after xr.open_dataset()
the datastore is closed and not needed anymore( until .compute()).
But in this case you're sending a reference from that store to the backendarray which will be lost once the store is closed.
I've gone the opposite way, in get_variables
only the required keys are sent. Then only once the backendarray __getitem__
is triggered the arrays are loaded.
I noticed this quite quickly once I played around with the dataset and loaded some variables and if you haven't noticed it yet maybe the zarr backend does some interesting tricks to avoid the closed references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yerah not sure. We do have distributed tests for reading zarr so presumably it's all OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoop whoop!
We already get the underlying Zarr array in
xarray/xarray/backends/zarr.py
Lines 529 to 531 in bb8511e
and then pass it to
open_store_variable
. Just pass that array down toZarrArrayWrapper
instead of reading it from the datastore again.