Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Provide a meta array to change array type returned #501

Closed
wants to merge 10 commits into from
Closed

WIP: Provide a meta array to change array type returned #501

wants to merge 10 commits into from

Conversation

jakirkham
Copy link
Member

Adds a user provided meta-array to change the array types allocated.

cc @mrocklin (thanks for the tip! 😄)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

zarr/core.py Outdated
@@ -711,7 +719,7 @@ def _get_basic_selection_zd(self, selection, out=None, fields=None):

except KeyError:
# chunk not initialized
chunk = np.zeros((), dtype=self._dtype)
chunk = np.zeros_like(self._meta, shape=(), dtype=self._dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware, this may restrict you to a somewhat recent version of Numpy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was just writing that)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always have a safe wrapper for this, like we do in dask: https://github.com/dask/dask/blob/master/dask/array/utils.py#L288-L334

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is tricky if the storage layer is already producing an array of a different type, this will then cause a conversion. With CuPy for example, we know this is not an option ( cupy/cupy#589 ). I'm not sure if we shouldn't just error out earlier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, why won't CuPy work? CuPy supports the *_like functions in here since cupy/cupy#2171.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning that when we can't use *_like functions we will create a NumPy array here, but we may be trying to store CuPy arrays into it. So this would fail because the CuPy arrays would be coerced to NumPy arrays. This is just an example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess that goes back to the array creation issue that we're trying to address in NumPy. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep :)

@mrocklin
Copy link
Contributor

mrocklin commented Nov 8, 2019

I'm glad to have provided a helpful tip. Also cc'ing @pentschev, who I'm sure will appreciate seeing others go through the same process that he pioneered.

@jakirkham
Copy link
Member Author

So this relies on the shape keyword in the NumPy *_like functions, which was introduced in NumPy 1.17. IDK if we are comfortable bumping to 1.17.0 as a minimum requirement or if need to add some fallback logic. Feedback on this point would be welcome. 🙂

@jakirkham jakirkham changed the title Provide a meta array to change array type returned WIP: Provide a meta array to change array type returned Nov 8, 2019
@jakirkham
Copy link
Member Author

This may need some tweaks as flag.writeable doesn't seem to always be a thing.

@pentschev
Copy link

That's cool @jakirkham that you're adding the meta array here as well. Good luck with that! :)

@jakirkham
Copy link
Member Author

Maybe @alimanfoo or @jrbourbeau can provide some feedback to help determine what we do for older NumPy versions.

@pep8speaks
Copy link

pep8speaks commented Nov 11, 2019

Hello @jakirkham! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-18 17:27:20 UTC

Previously we were calling `ascontiguousarray` and `asfortranarray`
based on whether we wanted a C or Fortran ordered array. However we can
simplify this by using `astype` to capture both cases. Additionally
those functions ensure the array is at least 1-D, but that doesn't seem
to matter for this use case since the scalar case is handled in a
different branch. So we leave that part out.
@jakirkham
Copy link
Member Author

Should add we will need to update the array creation functions accordingly.

Allows users to effectively override what kind of array is used to
allocate space and fill with data.
By using the `_meta_array` provided and NumPy's `*_like` functions,
allocate and fill arrays that match the type expected by the user.
To make sure that libraries that rely on pickling objects (like Dask or
other parallel libraries) respect the `_meta_array` attribute, pass it
along as part of the object's state when `__getstate__` is called. Since
`__setstate__` just calls the constructor with all arguments, we can be
sure that `_meta_array` will then be passed back into the new `Array`.
Though this all assumes that the `_meta_array` is something that can be
pickled.
As some arrays may not have `flags` or may not have read-only modes,
workaround these by using an empty `dict` if `flags` is not available
and assuming `writeable` is `True` if not specified.
To simplify the use of `meta_array` within a hierarchy, add a
`_meta_array` attribute to `Group`. This way `Array`s it contains and
other `Group`s can reuse the same `_meta_array` attribute.
To ensure all contained `Group`s and `Array`s use the same
`_meta_array`, make sure that `Group` initializes them with that value
when selecting them.
To make sure that libraries that rely on pickling objects (like Dask or
other parallel libraries) respect the `_meta_array` attribute, pass it
along as part of the object's state when `__getstate__` is called. Since
`__setstate__` just calls the constructor with all arguments, we can be
sure that `_meta_array` will then be passed back into the new `Group`.
Though this all assumes that the `_meta_array` is something that can be
pickled.
@alimanfoo
Copy link
Member

Hi @jakirkham, this is interesting :-)

Would be cool to understand a bit more about specific goals here.

I'm guessing one goal is that when slicing a zarr array (__getitem__) a user wants the result returned as something other than a numpy array, e.g., a cupy array? If so, is there a potential performance gain, or is it just about convenience to the user?

What's the goal when storing data into a zarr array (__setitem__)?

Re numpy version compatibility, we've only recently upgraded to numpy 1.17 ourselves, I suspect we'll need to be friendly to users who haven't done so yet.

@jakirkham jakirkham deleted the branch zarr-developers:master May 23, 2022 20:57
@jakirkham jakirkham closed this May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants