Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dask meta and output_dtypes error #5449

Merged
merged 3 commits into from
Jun 7, 2021

Conversation

mathause
Copy link
Collaborator

@mathause mathause commented Jun 7, 2021

This was changed in dask/dask#7669. Looks like they did not deprecate this behavior (i.e. passing both meta and output_dtypes). I'd suggest to follow dask's example here and not add a deprecation cycle. Thoughts?

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mathause

@mathause
Copy link
Collaborator Author

mathause commented Jun 7, 2021

thanks for the review - let's merge soon

@dcherian dcherian merged commit 34dc577 into pydata:master Jun 7, 2021
@mathause mathause deleted the apply_ufunc_dtype_meta branch June 7, 2021 21:07
@max-sixty
Copy link
Collaborator

Thanks @mathause !

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2021

Unit Test Results

0 files  ±0  0 suites  ±0   0s ⏱️ ±0s
0 tests ±0  0 ✔️ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 34dc577. ± Comparison against base commit 34dc577.

@jrbourbeau
Copy link
Contributor

Apologies for the breaking change. I'm happy to make sure we ping an Xarray dev next time we have a significant change to apply_gufunc to avoid surprises like this in the future.

@gjoseph92
Copy link

@mathause sorry for breaking things here. Note that passing output_dtypes didn't work as it was supposed to before, and also didn't cause a cast. We went back and forth on whether output_types should cause explicit casting, and whether it was sensible to provide both it and meta. Ultimately we decided they should be mutually exclusive, and should not cause casting, but without much knowledge of how downstream libraries were using these arguments. So maybe we should revisit that choice in dask?

Also I think maybe this test should be changed rather than skipped. Saying output_dtypes=[int] and then assert float == actual.dtype just seems weird to me. Perhaps removing one of output_dtypes or meta from the test would be the best solution.

@dcherian
Copy link
Contributor

dcherian commented Jun 7, 2021

but without much knowledge of how downstream libraries were using these arguments.

IIRC xarray doesn't actually use it. xarray.apply_ufunc is a convenient wrapper for apply_gufunc so we just pass along what the user provides.

@kmuehlbauer added that test so maybe he can provide more context

Perhaps removing one of output_dtypes or meta from the test would be the best solution.

OK good point! There are a few tests for output_dtypes but we don't actually test for providing dtype in meta, so perhaps we should do that.

@mathause
Copy link
Collaborator Author

mathause commented Jun 8, 2021

No worries. I agree that only allowing one is cleaner. And definitively +1 on adding tests using meta.

@kmuehlbauer
Copy link
Contributor

Thanks for the ping @dcherian.

There is this test:

def test_vectorize_dask_dtype_meta():
# meta dtype takes precedence
data_array = xr.DataArray([[0, 1, 2], [1, 2, 3]], dims=("x", "y"))
expected = xr.DataArray([1, 2], dims=["x"])
actual = apply_ufunc(
pandas_median,
data_array.chunk({"x": 1}),
input_core_dims=[["y"]],
vectorize=True,
dask="parallelized",
output_dtypes=[int],
dask_gufunc_kwargs=dict(meta=np.ndarray((0, 0), dtype=float)),
)
assert_identical(expected, actual)
assert float == actual.dtype
which provides dtype in meta. Maybe these different tests have to be aligned/combined? Unfortunately memory gets hazy over time, so I can't say much more about it without a deeper dive into.

QuLogic pushed a commit to QuLogic/xarray that referenced this pull request Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🟠 Test failure on master
6 participants