Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

Refactor apply_gufunc to compute meta itself, rather than relying on blockwise to do it. Having just one code path and consolidating everything on meta makes things easier to reason about, and happens to fix #7668. Additionally before, func would be called even if output_dtypes was given (contrary to the docstring); now it is no longer.

This also makes it an explicit error to pass both output_dtypes and meta—what you want in that case is rather ambiguous.

Also, use a modern function definition instead of popping things out of kwargs to support py2.

@github-actions github-actions bot added the array label May 18, 2021
gjoseph92 added 2 commits May 18, 2021 14:59
`args_meta[0]` could be wrong when there are multiple arguments. Even this I don't really like, since it's making a lot of assumptions about what `func` might be that may not be true in generality for any user-defined function.
@gjoseph92 gjoseph92 force-pushed the array/apply_gufunc-meta-inference branch from e058bf2 to 7a6b68f Compare May 18, 2021 20:59
Comment on lines +164 to +165
ret = apply_gufunc(foo, "()->()", 1.0, output_dtypes=float, bar=2)
assert_eq(ret, np.array(1.0, dtype=float))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was failing because the array chunks were float64, while meta expected float32 ("f"). I'm not sure why it ever passed before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original was specifying output_dtypes="f" and the assertion fails for that, it probably means somewhere the value is being cast to float64. Seems like a sneaky regression to me.

Maybe it would be worth checking if specifying other output_dtypes would also cast the return value to float64.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentschev I dug into this and found a number of concerning things:

  • On main, despite saying output_dtypes="f" (float32), ret.dtype is float64! As I mentioned in the description, output_types is currently actually ignored unless inferring the dtype from func fails. Since calling the foo function with a float64 succeeds (the 1.0 input becomes NumPy float64), apply_gufunc returns a float64 array, even though you told it to do float32.
  • Then, assert_eq doesn't actually check dtypes?! Turns out there is actually nothing in the assert_eq logic to check whether the dtypes of the two computed arrays match—just lots of logic to check that each one's meta matches its result. So the fact that ret and the expected np.array(1.0, dtype="f") have different dtypes is ignored, and the test passes. @jrbourbeau is this well-known? Should I open an issue about it? Seems like misleading behavior.

Since my changes now respect output_dtypes, the resulting array's meta (float32) does not match its computed dtype (float64, from the 1.0 input value), and assert_eq complained about that meta mismatch until I made the change now in this PR.

So basically, the old test was wrong: the output dtype in this case is in fact float64, so passing output_dtype="f" should have resulted in an error from assert_eq about a meta mismatch, except that output_dtype was ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • On main, despite saying output_dtypes="f" (float32), ret.dtype is float64! As I mentioned in the description, output_types is currently actually ignored unless inferring the dtype from func fails. Since calling the foo function with a float64 succeeds (the 1.0 input becomes NumPy float64), apply_gufunc returns a float64 array, even though you told it to do float32.

To be honest, I'm not certain what's the exact behavior we want/users expect here. But if output_dtypes is specified, shouldn't apply_gufunc cast to the output_dtypes? Note that we do that in other places, such as partial_reduce. That would be, of course, only for the case where meta is computed automatically, but the user wants to enforce the output_dtypes nevertheless.

  • Then, assert_eq doesn't actually check dtypes?! Turns out there is actually nothing in the assert_eq logic to check whether the dtypes of the two computed arrays match—just lots of logic to check that each one's meta matches its result. So the fact that ret and the expected np.array(1.0, dtype="f") have different dtypes is ignored, and the test passes. @jrbourbeau is this well-known? Should I open an issue about it? Seems like misleading behavior.

Interesting catch, and I would say we should indeed open an issue. Even though I don't recall if there's a reason why we don't do it, it's worth at least discussing that, in case that is in fact a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aggree with your assessment that assert_eq doesn't check computed dtype against meta dtype. That'd be a nice thing to add :)

):
return x

if isinstance(x, list) or isinstance(x, tuple):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this before the not hasattr(x, "shape") or not hasattr(x, "dtype"), because otherwise it's a no-op: lists and tuples don't have shape/dtype, so we always would have returned from the function before reaching this check.

Comment on lines +159 to +162
# min/max functions have no identity, just use the same input type when there's only one
if len(
args_meta
) == 1 and "zero-size array to reduction operation" in str(e):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in #7668. @pentschev, curious about your thoughts here. Personally, I don't think we should have this case at all; I don't think it's a safe assumption to make in generality for any user-defined function. But I do see the convenience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment, t's not a safe assumption for user-defined functions indeed and I would as well prefer not to need this. However, without this condition attempting to compute meta for a 0-D downstream array (e.g., CuPy) will fall into the general except Exception case and return None, which will cause meta to be a NumPy array, and thus break those downstream use cases. The problem is we have no known alternative, this is why we check the specific exception message, for any other corner cases we would have to extend this function to handle them properly, but user-defined functions should guarantee proper handling of 0-D arrays to avoid such issues.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging me here @gjoseph92 . Overall this looks good to me, it seems like a more feature-complete manner of handling apply_gufunc. I've left a few comments, and assuming I haven't missed anything about that, I think test_apply_gufunc_pass_additional_kwargs is a potentially concerning test, so it may be worth checking whether some sneaky dtype casting is happening silently, such cases can happen and can be difficult to find, but perhaps it's just me being overly cautious.

I've also verified that this PR doesn't break the CuPy tests we have (which aren't covered by CI yet), so this feels like a good solution. Thanks for working on it!

Comment on lines +164 to +165
ret = apply_gufunc(foo, "()->()", 1.0, output_dtypes=float, bar=2)
assert_eq(ret, np.array(1.0, dtype=float))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original was specifying output_dtypes="f" and the assertion fails for that, it probably means somewhere the value is being cast to float64. Seems like a sneaky regression to me.

Maybe it would be worth checking if specifying other output_dtypes would also cast the return value to float64.

Comment on lines +159 to +162
# min/max functions have no identity, just use the same input type when there's only one
if len(
args_meta
) == 1 and "zero-size array to reduction operation" in str(e):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment, t's not a safe assumption for user-defined functions indeed and I would as well prefer not to need this. However, without this condition attempting to compute meta for a 0-D downstream array (e.g., CuPy) will fall into the general except Exception case and return None, which will cause meta to be a NumPy array, and thus break those downstream use cases. The problem is we have no known alternative, this is why we check the specific exception message, for any other corner cases we would have to extend this function to handle them properly, but user-defined functions should guarantee proper handling of 0-D arrays to avoid such issues.

@jsignell
Copy link
Member

Is this ready to be merged? Also I am planning on merging #6863 today as well - I think the two PRs address different issues.

@pentschev
Copy link
Member

I'm fine with that @jsignell , I think the only unanswered portion here was the first part of #7669 (comment) . It's only a matter of how we want/should treat output_dtypes when meta is computed automatically.

@gjoseph92
Copy link
Collaborator Author

I think the question is whether output_dtypes should cause a cast. I'm leaning towards it should, because I think it's more intuitive. Additionally, it's more similar to np.vectorize, where giving otypes does ensure the function outputs that type.

It's basically a question of whether output_dtypes is purely meant as a hint/annotation, to help dask determine the meta—and in general, we treat meta as something that's supposed to accurately simulate computation, but which does not affect computation in any way—or whether it's actually a request on the user's part.

However, I have very little preference, this is working as is, and based on the docstring for output_dtypes, I don't think the current behavior is wrong per se. So @jsignell, I'm happy if you want to merge this now. We could always add the cast as a separate PR in the future.

@jsignell jsignell merged commit f93bc86 into dask:main Jun 3, 2021
@mathause
Copy link

mathause commented Jun 7, 2021

Unless I am missing something you currently don't have a test for apply_gufunc passing both meta and output_dtypes, i.e. the newly raised ValueError is not tested.

@gjoseph92
Copy link
Collaborator Author

@mathause good catch, there's no test for raising that error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apply_gufunc bugs with meta inference

4 participants