-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add typing to unary and binary arithmetic operators #4904
Conversation
Thanks a lot @rhkleijn , this is very impressive! The results look great. The Did you ever consider replacing the injections with this macro-like approach to writing out the python methods, rather than using the macro to generate the stub files? (I don't bring this up with a view that it's better, but the injections have always seemed awkward) Is there anything we should be aware of around the interactions between stub files and inline definitions? Others will have useful thoughts too, so would be interested to hear from them too. |
Thanks for the PR & your work. I'll need some time to digest that all... One thing I certainly would like to fix before merging this is #4881 (and merge #4878). I'd also be interested in your motivation for the stub files. (It's just that we don't have any of these so far). I am sure you saw this comment: Lines 3 to 5 in 3681b65
so there would be an argument to create mixin-classes*. Another motivation would be that currently (with the * Well I just realised that your stub files are kind of mixin classes...
|
I did not consider it but that is a very promising idea. It is probably feasible to adapt the script to generate the methods themselves together with their type hints into a regular Python module instead of a stub file. I will give it a try this week. It would have the same disadvantage of needing the script for generating the file. There are several advantages: the stub file approach which may feel a bit odd would not even be necessary, it could replace the current injection logic, implementation and type hints would be in a single place and it could simplify the class hierarchy a bit.
The mypy documentation states that if a directory contains both a .py and a .pyi file for the same module, the .pyi file takes precedence. It is my understanding that stub files work on a file by file basis and you cannot mix them for a single file. There should be no interactions here, as a _typed_ops.py module doesn't even exist in this case, only the _typed_ops.pyi stub file. |
I agree. Both PRs you mention would be nice to see merged. I was still using numpy 1.19.
I usually prefer inline type hints. The motivation for stub files was mostly because all other approaches I tried where not sufficient, see my comment in #4054. It turns out we might not even need a stub file, see my reply above to max-sixty's comments.
That is correct. It can currently be considered a mixin class during type checking and a no-op at compile and runtime. If I can get max-sixty's suggestion to work, then we would quite literally be following up on that TODO by using mixin classes instead of the unintuitive "inject" functions.
Those are mypy warning for overlapping overload signatures with incompatible return types. In practice it is not a problem as the first matching overload gets picked. One reason for overlapping is that with numpy <=1.20
Agreed. What name would work in this case? Something like
I think it would be nice to allow some simple kinds of subclassing (some (many?) methods already preserve the subclass in the return value but the type hint mentions the base class as return type). The particular use case I have in mind involves having a custom Accessor registered with
I will also bring this use case up in #3980. Until that is resolved, I am open to here use either bound TypeVars or just "Dataset" and "DataArray". |
Thanks for your thought-out answer. That all sounds good to me.
So that works now? I remember vaguely that @max-sixty (?) had an issue with this pattern once.
I vote for
Yes, I think that's a problem of the |
@rhkleijn I'm a bit late to respond here, but that could be really excellent IMO, I'd be really interested to see that working. Let me know if there's anything from my end that would be helpful. |
Done Locally it seems to work fine. Let's see what happens in CI now.
Not yet. I am working on generating the methods themselves on a local branch. It required changing all |
You can now view the branch I mentioned in my previous message at master...rhkleijn:gen-ops . It would get rid of the need for dynamically generating (injecting) unary and binary operators at runtime. Note that its script generates both the module containing the arithmetic methods and its accompanying stub file. This setup turned out to be more robust than both:
In all cases mypy and pylance work nicely. I prefer the Also, I am not sure what to do with the |
The I have a few questions, but they don't need to hold back merging — if others agree — in particular on the
Using python to template the methods works pretty well! Better than I would have expected relative to a templating library like jinja. |
thanks!
It takes precedence only during type checking as the source for static type information. The generated
There seem to be quite some open issues here with no indication they currently are actively working on fixing them.
For these unary ops with simple type hints if would probably work but for most other methods with multiple overloads the typing ecosystem is not yet mature enough. In this issue I have commented on it. I hope one day we will be able to refactor this into some generic parent classes (one for unary ops, one for binary (and reflexive) ops and one for inplace ops) from which we can inherit. Using a generic parent class only for the unary ops and not for the others would make the script more complicated. For now, I'd say this somewhat low-tech solution gets the job done.
Unfortunately we have to define them both ways for typing purposes. This comment in
Also here, my hope is that someday this will not be needed anymore.
The
I was also pleasantly surprised by it. And even black leaves the generated files unchanged. |
Thanks a lot for the full response! I forgot to reply to one question of yours — let's use this PR, if you want to replace the current code with the
Ofc, I see, there's no type information in the
Thanks for the comments — even better to have them in the code. I still have one question — if we remove a line like this, does mypy not recognize it can use the method on a class DataArrayOpsMixin:
def __add__(self, other: T_Dataset) -> T_Dataset: ...
That's surprising — I wonder where it's getting that information from. I would have expected that it wouldn't be able to find any compatible implementation of a
I would vote not to prioritize a tool if it's not compatible — this code will last for a long time, and I expect PyCharm will catch up to the language specs soon.
Great, thank you |
done
I have tried that but PyCharm would not handle the
If we remove that line, it incorrectly infers the result of
Agreed that this is surprising behaviour. I suspect mypy might have something like an implicit internal rule to fallback to the type of the first operand of a binary operator. That would be consistent with the behaviours of both of the above cases.
I can relate to that. On the other hand, there may also be a pragmatic reason to go the extra mile. If we don't it may result in users thinking xarray's newly added typing is to blame for unexpected type inferences and reporting issues here. When at some point in the future these tools indeed do catch up it is straightforward to adapt the script and regenerate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another look at the PR - I think it also makes it easier to understand where the unary & binary operations are coming from. The next step will be to get rid of the other inject_*
functions (not here).
I suggest you add an entry to whats-new
and mark it as "Ready for review". Would be good to get another pair of eyes on it.
xarray/core/_typed_ops.pyi
Outdated
|
||
Scalar = Union[int, float, complex, bytes, str, np.generic] | ||
ArrayLike = Union[np.ndarray, DaskArray] | ||
ScalarOrArray = Union[Scalar, ArrayLike] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following works
da = xr.DataArray([1, 2, 3])
da + [1, 2, 3]
so I wonder if you have to add ArrayLike
from np.typing
here (need to add it to core/npcompat.py
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! I have now added it to npcompat.py
including a fallback for installation with older numpy versions. The fallback is based on code from numpy 1.20. Numpy's ArrayLike
already includes Python scalars so that simplifies the type definitions we need at the top of the generated file.
@@ -121,20 +116,6 @@ | |||
None. Changed in version 0.17.0: if specified on an integer array | |||
and skipna=True, the result will be a float array.""" | |||
|
|||
_COARSEN_REDUCE_DOCSTRING_TEMPLATE = """\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did that go? Or was this superfluous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not used.
@rhkleijn I missed your last comment, apologies.
That makes sense, and it sounds like you've checked these cases in lots of detail, thank you for all your hard work. If you want, add a comment to the code that references your comments here, so there's some record of the research you've done. (Optional ofc) Let's answer @mathause questions, and then you should feel free to merge unless there are any objections — we may learn more from others using it — and very easy to iterate more from here. |
I'll have to take a closer look, but overall this looks amazing! Both a nice internal clean-up and a valuable feature for Xarray users. |
Thanks for having a glance @shoyer . Let us know when it's OK to merge! |
I'll merge this in the next couple of days unless I hear anything! Thanks everyone |
Hello @rhkleijn! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
FYI I resolved conflicts in GitHub; I may have made a mistake, tests should tell us though |
Thanks, there was one more minor merge artifact. Should be fixed now. |
@rhkleijn thanks for the huge effort! We just discussed this on our team call and were universally impressed. |
pre-commit run --all-files
whats-new.rst
This draft PR is for soliciting feedback on an approach to adding typing to unary and binary operators for xarray classes.
The approach involves generating a stub file using a simple script.
In order to untangle the various helper classes implementing arithmetic throughout the code base I have refactored the code, thereby getting rid of the injection of the operators after class definition. It now inserts the operators during construction of the subclasses leveraging Python's
__init_subclass__
mechanism.For this example python file:
the status quo gives this mypy output:
With this PR the output now becomes: