-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Provide a meta array to change array type returned #501
Conversation
zarr/core.py
Outdated
@@ -711,7 +719,7 @@ def _get_basic_selection_zd(self, selection, out=None, fields=None): | |||
|
|||
except KeyError: | |||
# chunk not initialized | |||
chunk = np.zeros((), dtype=self._dtype) | |||
chunk = np.zeros_like(self._meta, shape=(), dtype=self._dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware, this may restrict you to a somewhat recent version of Numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(was just writing that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always have a safe wrapper for this, like we do in dask: https://github.com/dask/dask/blob/master/dask/array/utils.py#L288-L334
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is tricky if the storage layer is already producing an array of a different type, this will then cause a conversion. With CuPy for example, we know this is not an option ( cupy/cupy#589 ). I'm not sure if we shouldn't just error out earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand, why won't CuPy work? CuPy supports the *_like
functions in here since cupy/cupy#2171.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that when we can't use *_like
functions we will create a NumPy array here, but we may be trying to store CuPy arrays into it. So this would fail because the CuPy arrays would be coerced to NumPy arrays. This is just an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess that goes back to the array creation issue that we're trying to address in NumPy. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep :)
I'm glad to have provided a helpful tip. Also cc'ing @pentschev, who I'm sure will appreciate seeing others go through the same process that he pioneered. |
So this relies on the |
This may need some tweaks as |
That's cool @jakirkham that you're adding the meta array here as well. Good luck with that! :) |
Maybe @alimanfoo or @jrbourbeau can provide some feedback to help determine what we do for older NumPy versions. |
Hello @jakirkham! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-02-18 17:27:20 UTC |
Previously we were calling `ascontiguousarray` and `asfortranarray` based on whether we wanted a C or Fortran ordered array. However we can simplify this by using `astype` to capture both cases. Additionally those functions ensure the array is at least 1-D, but that doesn't seem to matter for this use case since the scalar case is handled in a different branch. So we leave that part out.
Should add we will need to update the array creation functions accordingly. |
Allows users to effectively override what kind of array is used to allocate space and fill with data.
By using the `_meta_array` provided and NumPy's `*_like` functions, allocate and fill arrays that match the type expected by the user.
To make sure that libraries that rely on pickling objects (like Dask or other parallel libraries) respect the `_meta_array` attribute, pass it along as part of the object's state when `__getstate__` is called. Since `__setstate__` just calls the constructor with all arguments, we can be sure that `_meta_array` will then be passed back into the new `Array`. Though this all assumes that the `_meta_array` is something that can be pickled.
As some arrays may not have `flags` or may not have read-only modes, workaround these by using an empty `dict` if `flags` is not available and assuming `writeable` is `True` if not specified.
To simplify the use of `meta_array` within a hierarchy, add a `_meta_array` attribute to `Group`. This way `Array`s it contains and other `Group`s can reuse the same `_meta_array` attribute.
To ensure all contained `Group`s and `Array`s use the same `_meta_array`, make sure that `Group` initializes them with that value when selecting them.
To make sure that libraries that rely on pickling objects (like Dask or other parallel libraries) respect the `_meta_array` attribute, pass it along as part of the object's state when `__getstate__` is called. Since `__setstate__` just calls the constructor with all arguments, we can be sure that `_meta_array` will then be passed back into the new `Group`. Though this all assumes that the `_meta_array` is something that can be pickled.
Hi @jakirkham, this is interesting :-) Would be cool to understand a bit more about specific goals here. I'm guessing one goal is that when slicing a zarr array ( What's the goal when storing data into a zarr array ( Re numpy version compatibility, we've only recently upgraded to numpy 1.17 ourselves, I suspect we'll need to be friendly to users who haven't done so yet. |
Adds a user provided meta-array to change the array types allocated.
cc @mrocklin (thanks for the tip! 😄)
TODO:
tox -e docs
)