-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change isinstance checks to duck Dask Array checks #4208 #4221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for going through and finding/adjusting all of these Dask checks!
I've left some comments below, which, other than the one about is_duck_dask_array
still using isinstance
instead of is_dask_collection
, are mostly there to help get discussion going on a few points.
Also, a broader discussion that's I've seen hinted to in the past, but is brought to the forefront by this PR: how should xarray be checking for general duck array types it can wrap? Right now, it looks like there is a mix of is_array_like(data) and hasattr(data, "__array_function__") or isinstance(data, dask_array_type) and it would be nice to bring consistency. However, both these checks seem like they'd let too many types through. For example, |
Revisiting it I now realize that |
As long as that doesn't break any of the current uses, I think that would be the best way forwards. This would require xarray to be on NumPy 1.16+ (in order to ensure is_duck_array(x) and not isinstance(x, (DataArray, Variable)) with Although, while we're at it, do we also need more careful handling of upcast types? I'm not sure if there even are any out there right now (not sure if a HoloViews Dataset counts here), but that doesn't necessarily mean there never will be any. |
I'm getting a bit lost within the testing suite while trying to figure out the implications of this change and what kinds of tests are necessary for it. Would a test module for this
Any suggestions would be greatly appreciated. |
@rpmanser For what it's worth, I'd think doing tests like |
Thanks for working on this @rpmanser. Since you've changed every
Seems like this is covered by We could add specific pint-as-dask tests to This could be a discussion topic for our next dev meeting on wednesday. |
I have erred on the side of leaving the On another note, if it is appropriate to apply these changes:
I can do that as well. If not, should a new issue be opened for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rpmanser, this is quite close! Thanks for your efforts here.
* Rename `is_array_like` to `is_duck_array` * `is_duck_array` checks for `__array_function__` and `__array_ufunc__` in addition to previous checks * Replace checks for `is_duck_dask_array` and `__array_function__` with `is_duck_array`
we went with xarray/xarray/core/variable.py Line 938 in edd5c1e
xarray/xarray/core/formatting.py Line 554 in 3337b6f
def is_duck_array_or_ndarray(array):
return is_duck_array(array) or (not IS_NEP18_ACTIVE and isinstance(data, np.ndarray)) |
* Add explicit check for NumPy array to is_duck_array * Replace is_duck_array_or_ndarray checks with is_duck_array
I meant |
you mean the commit message? If so, you can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rpmanser
Co-authored-by: keewis <keewis@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two comments, otherwise looks good to me.
@@ -389,7 +389,7 @@ def _bottleneck_reduce(self, func, **kwargs): | |||
valid = (slice(None),) * axis + (slice(-shift, None),) | |||
padded = padded.pad({self.dim[0]: (0, -shift)}, mode="constant") | |||
|
|||
if isinstance(padded.data, dask_array_type): | |||
if is_duck_dask_array(padded.data): | |||
raise AssertionError("should not be reachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pydata/xarray: why is that here? If it is never reachable, it should be fine to just not do the branching (and if it is necessary, it should be fine to use assert not is_duck_dask_array(padded.data), "should not be reachable"
).
Also, if we raise the code below will never be reached, so we should be able to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's from #3040 . We would like to use this code path but there are bugs IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I guess we should raise a NotImplementedError
or a ValueError
instead of a AssertionError
and improve the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could comment it out and add a TODO note. The if
condition in Line 420 makes sure that this is never reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The reason
for requires_pint_0_15
would be nice but I don't feel strongly about it so it's up to you.
Thanks for being patient, @rpmanser.
No worries, thanks for the reviews! |
Things seem to have died down here so let's test it out. Thanks @rpmanser This is an amazing first PR. Thanks for your efforts and welcome to xarray! |
@dcherian No problem! Thanks for all your help! |
isort -rc . && black . && mypy . && flake8
I added the discussed
is_duck_dask_array(x)
function and replacedisinstance(x, dask_array_type)
checks withis_duck_dask_array(x)
. Existing tests are passing. I did not changeisinstance(x, dask_array_type)
checks in the testing modules since I am unsure if that is appropriate.As for additional tests, I am leaning towards testing Xarray ( Pint ( Dask ) ) ) objects as implemented in upcoming Pint v0.15 rather than a mock class, but I am open to discussion. I am unfamiliar with the release schedules of Pint and Xarray but I imagine that would influence which direction the testing should go.