-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make reachability code understand chained comparisons #7169
Make reachability code understand chained comparisons #7169
Conversation
A few meta notes:
|
Currently, our reachability code does not understand how to parse comparisons like `a == b == c`: the `find_isinstance_check` method only attempts to analyze comparisons that contain a single `==`, `is`, or `in` operator. This pull request generalizes that logic so we can support any arbitrary number of comparisons. It also along the way unifies the logic we have for handling `is` and `==` checks: the latter check is now just treated a weaker variation of the former. (Expressions containing `==` may do arbitrary things if the underlying operands contain custom `__eq__` methods.) As a side-effect, this PR adds support for the following: x: Optional[str] if x is 'some-string': # Previously, the revealed type would be Union[str, None] # Now, the revealed type is just 'str' reveal_type(x) else: reveal_type(x) # N: Revealed type is 'Union[builtins.str, None]' We previously supported this narrowing logic when doing equality checks (e.g. when doing `if x == 'some-string'`). As a second side-effect, this PR adds support for the following: class Foo(Enum): A = 1 B = 2 y: Foo if y == Foo.A: reveal_type(y) # N: Revealed type is 'Literal[Foo.A]' else: reveal_type(y) # N: Revealed type is 'Literal[Foo.B]' We previously supported this kind of narrowing only when doing identity checks (e.g. `if y is Foo.A`). To avoid any bad interactions with custom `__eq__` methods, we enable this narrowing check only if both operands do not define custom `__eq__` methods.
87f1571
to
90418df
Compare
reveal_type(x) # N: Revealed type is '__main__.Foo' | ||
reveal_type(y) # N: Revealed type is '__main__.Foo' | ||
|
||
if x == Foo.A == y: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does x == y == Foo.A work? (It feels like we ought to support this or I'm not sure how much this helps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is an important case, and we should have a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I have some comments here. Also in general adding more type restrictions to literals can cause some false positives (because of invariance or binder being too optimistic) so I think it is better to try this on our internal code first (if you will have time).
reveal_type(x) # N: Revealed type is '__main__.Foo' | ||
reveal_type(y) # N: Revealed type is '__main__.Foo' | ||
|
||
if x == Foo.A == y: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is an important case, and we should have a test for this.
else: | ||
reveal_type(x) # N: Revealed type is '__main__.Foo' | ||
reveal_type(y) # N: Revealed type is '__main__.Foo' | ||
[builtins fixtures/primitives.pyi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add more tests for chaining. The PR title mentions chaining, but there is literally only one test about this.
In particular I would add tests for:
- Mixed kinds of comparisons in the chain like
x == 2 > y is 3
- Mixing chained comparisons using
and
/or
with other chain and other kinds likecallable()
andisinstance()
if eq_sym is None or ne_sym is None: | ||
return False | ||
return (eq_sym.fullname == 'builtins.object.__eq__' | ||
and ne_sym.fullname == 'builtins.object.__ne__') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks too strong to me. Wouldn't fullname.startswith('builtins.')
be enough (like we do for --strict-equality
)?
|
||
When in doubt, this function will conservatively bias towards | ||
returning False. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use this style*
"""Single line summary.
Longer description...
"""
* This review comment was sponsored by Jukka
identity comparison (left_expr is right_expr), not just an equality comparison | ||
(left_expr == right_expr). Identity checks are not overridable, so we can infer | ||
more information in that case. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update docstring style here.
@@ -4615,6 +4681,32 @@ def is_private(node_name: str) -> bool: | |||
return node_name.startswith('__') and not node_name.endswith('__') | |||
|
|||
|
|||
def uses_default_equality_checks(typ: Type) -> bool: | |||
"""Returns 'true' if we know for certain that the given type is using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Returns 'true' if we know for certain that the given type is using | |
"""Returns 'True' if we know for certain that the given type is using |
# it already has the most precise type possible so does not need to | ||
# be narrowed/included in the output map. | ||
# | ||
# TODO: Generalize this to handle the case where 'other_type' is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this TODO moved somewhere or is it now fixed? Do we have regression test in the latter case?
""" | ||
|
||
# For the sake of simplicity, we currently attempt inferring a more precise type | ||
# for just one of the two variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important is this? If we actually do this, I think it is better to use the type of right one to restrict the type of left one.
# enough that this is fine in practice. | ||
# | ||
# We could also probably generalize this block to strip away *any* singleton type, | ||
# if we were fine with a bit more unsoundness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepend the last sentence with TODO:
?
Just to clarify I mean something like this #4168 (comment) |
While working on overhauling python#7169, I discovered that simply just "deconstructing" enums into unions leads to some false positives in some real-world code. This is an existing problem, but became more prominent as I worked on improving type inference in the above PR. For example, consider the following program: ``` from enum import Enum class Foo(Enum): A = 1 B = 2 class Wrapper: def __init__(self, x: bool, y: Foo) -> None: if x: if y is Foo.A: # 'y' is of type Literal[Foo.A] here pass else: # ...and of type Literal[Foo.B] here pass # We join these two types after the if/else to end up with # Literal[Foo.A, Foo.B] self.y = y else: # ...and so this fails! 'Foo' is not considered a subtype of # 'Literal[Foo.A, Foo.B]' self.y = y ``` I considered three different ways of fixing this: 1. Modify our various type comparison operations (`is_same`, `is_subtype`, `is_proper_subtype`, etc...) to consider `Foo` and `Literal[Foo.A, Foo.B]` equivalent. 2. Modify the 'join' logic so that when we join enum literals, we check and see if we can merge them back into the original class, undoing the "deconstruction". 3. Modify the `make_simplified_union` logic to do the reconstruction instead. I rejected the first two options: the first approach is the most sound one, but seemed complicated to implement. We have a lot of different type comparison operations and attempting to modify them all seems error-prone. I also didn't really like the idea of having two equally valid representations of the same type, and would rather push mypy to always standardize on one, just from a usability point of view. The second option seemed workable but limited to me. Modifying join would fix the specific example above, but I wasn't confident that was the only place we'd needed to patch. So I went with modifying `make_simplified_union` instead. The main disadvantage of this approach is that we still get false positives when working with Unions that come directly from the semantic analysis phase. For example, we still get an error with the following program: x: Literal[Foo.A, Foo.B] y: Foo # Error, we still think 'x' is of type 'Literal[Foo.A, Foo.B]' x = y But I think this is an acceptable tradeoff for now: I can't imagine too many people running into this. But if they do, we can always explore finding a way of simplifying unions after the semantic analysis phase or bite the bullet and implement approach 1.
While working on overhauling python#7169, I discovered that simply just "deconstructing" enums into unions leads to some false positives in some real-world code. This is an existing problem, but became more prominent as I worked on improving type inference in the above PR. Here's a simplified example of one such problem I ran into: ``` from enum import Enum class Foo(Enum): A = 1 B = 2 class Wrapper: def __init__(self, x: bool, y: Foo) -> None: if x: if y is Foo.A: # 'y' is of type Literal[Foo.A] here pass else: # ...and of type Literal[Foo.B] here pass # We join these two types after the if/else to end up with # Literal[Foo.A, Foo.B] self.y = y else: # ...and so this fails! 'Foo' is not considered a subtype of # 'Literal[Foo.A, Foo.B]' self.y = y ``` I considered three different ways of fixing this: 1. Modify our various type comparison operations (`is_same`, `is_subtype`, `is_proper_subtype`, etc...) to consider `Foo` and `Literal[Foo.A, Foo.B]` equivalent. 2. Modify the 'join' logic so that when we join enum literals, we check and see if we can merge them back into the original class, undoing the "deconstruction". 3. Modify the `make_simplified_union` logic to do the reconstruction instead. I rejected the first two options: the first approach is the most sound one, but seemed complicated to implement. We have a lot of different type comparison operations and attempting to modify them all seems error-prone. I also didn't really like the idea of having two equally valid representations of the same type, and would rather push mypy to always standardize on one, just from a usability point of view. The second option seemed workable but limited to me. Modifying join would fix the specific example above, but I wasn't confident that was the only place we'd needed to patch. So I went with modifying `make_simplified_union` instead. The main disadvantage of this approach is that we still get false positives when working with Unions that come directly from the semantic analysis phase. For example, we still get an error with the following program: x: Literal[Foo.A, Foo.B] y: Foo # Error, we still think 'x' is of type 'Literal[Foo.A, Foo.B]' x = y But I think this is an acceptable tradeoff for now: I can't imagine too many people running into this. But if they do, we can always explore finding a way of simplifying unions after the semantic analysis phase or bite the bullet and implement approach 1.
While working on overhauling python#7169, I discovered that simply just "deconstructing" enums into unions leads to some false positives in some real-world code. This is an existing problem, but became more prominent as I worked on improving type inference in the above PR. Here's a simplified example of one such problem I ran into: ``` from enum import Enum class Foo(Enum): A = 1 B = 2 class Wrapper: def __init__(self, x: bool, y: Foo) -> None: if x: if y is Foo.A: # 'y' is of type Literal[Foo.A] here pass else: # ...and of type Literal[Foo.B] here pass # We join these two types after the if/else to end up with # Literal[Foo.A, Foo.B] self.y = y else: # ...and so this fails! 'Foo' is not considered a subtype of # 'Literal[Foo.A, Foo.B]' self.y = y ``` I considered three different ways of fixing this: 1. Modify our various type comparison operations (`is_same`, `is_subtype`, `is_proper_subtype`, etc...) to consider `Foo` and `Literal[Foo.A, Foo.B]` equivalent. 2. Modify the 'join' logic so that when we join enum literals, we check and see if we can merge them back into the original class, undoing the "deconstruction". 3. Modify the `make_simplified_union` logic to do the reconstruction instead. I rejected the first two options: the first approach is the most sound one, but seemed complicated to implement. We have a lot of different type comparison operations and attempting to modify them all seems error-prone. I also didn't really like the idea of having two equally valid representations of the same type, and would rather push mypy to always standardize on one, just from a usability point of view. The second option seemed workable but limited to me. Modifying join would fix the specific example above, but I wasn't confident that was the only place we'd needed to patch. So I went with modifying `make_simplified_union` instead. The main disadvantage of this approach is that we still get false positives when working with Unions that come directly from the semantic analysis phase. For example, we still get an error with the following program: x: Literal[Foo.A, Foo.B] y: Foo # Error, we still think 'x' is of type 'Literal[Foo.A, Foo.B]' x = y But I think this is an acceptable tradeoff for now: I can't imagine too many people running into this. But if they do, we can always explore finding a way of simplifying unions after the semantic analysis phase or bite the bullet and implement approach 1.
This diff adds support for the following pattern: ```python from typing import Enum, List from enum import Enum class Key(Enum): A = 1 B = 2 class Foo: key: Literal[Key.A] blah: List[int] class Bar: key: Literal[Key.B] something: List[str] x: Union[Foo, Bar] if x.key is Key.A: reveal_type(x) # Revealed type is 'Foo' else: reveal_type(x) # Revealed type is 'Bar' ``` In short, when we do `x.key is Key.A`, we "propagate" the information we discovered about `x.key` up one level to refine the type of `x`. We perform this propagation only when `x` is a Union and only when we are doing member or index lookups into instances, typeddicts, namedtuples, and tuples. For indexing operations, we have one additional limitation: we *must* use a literal expression in order for narrowing to work at all. Using Literal types or Final instances won't work; See python#7905 for more details. To put it another way, this adds support for tagged unions, I guess. This more or less resolves python#7344. We currently don't have support for narrowing based on string or int literals, but that's a separate issue and should be resolved by python#7169 (which I resumed work on earlier this week).
This diff adds support for the following pattern: ```python from typing import Enum, List from enum import Enum class Key(Enum): A = 1 B = 2 class Foo: key: Literal[Key.A] blah: List[int] class Bar: key: Literal[Key.B] something: List[str] x: Union[Foo, Bar] if x.key is Key.A: reveal_type(x) # Revealed type is 'Foo' else: reveal_type(x) # Revealed type is 'Bar' ``` In short, when we do `x.key is Key.A`, we "propagate" the information we discovered about `x.key` up one level to refine the type of `x`. We perform this propagation only when `x` is a Union and only when we are doing member or index lookups into instances, typeddicts, namedtuples, and tuples. For indexing operations, we have one additional limitation: we *must* use a literal expression in order for narrowing to work at all. Using Literal types or Final instances won't work; See python#7905 for more details. To put it another way, this adds support for tagged unions, I guess. This more or less resolves python#7344. We currently don't have support for narrowing based on string or int literals, but that's a separate issue and should be resolved by python#7169 (which I resumed work on earlier this week).
This diff adds support for the following pattern: ```python from typing import Enum, List from enum import Enum class Key(Enum): A = 1 B = 2 class Foo: key: Literal[Key.A] blah: List[int] class Bar: key: Literal[Key.B] something: List[str] x: Union[Foo, Bar] if x.key is Key.A: reveal_type(x) # Revealed type is 'Foo' else: reveal_type(x) # Revealed type is 'Bar' ``` In short, when we do `x.key is Key.A`, we "propagate" the information we discovered about `x.key` up one level to refine the type of `x`. We perform this propagation only when `x` is a Union and only when we are doing member or index lookups into instances, typeddicts, namedtuples, and tuples. For indexing operations, we have one additional limitation: we *must* use a literal expression in order for narrowing to work at all. Using Literal types or Final instances won't work; See python#7905 for more details. To put it another way, this adds support for tagged unions, I guess. This more or less resolves python#7344. We currently don't have support for narrowing based on string or int literals, but that's a separate issue and should be resolved by python#7169 (which I resumed work on earlier this week).
This diff adds support for the following pattern: ```python from typing import Enum, List from enum import Enum class Key(Enum): A = 1 B = 2 class Foo: key: Literal[Key.A] blah: List[int] class Bar: key: Literal[Key.B] something: List[str] x: Union[Foo, Bar] if x.key is Key.A: reveal_type(x) # Revealed type is 'Foo' else: reveal_type(x) # Revealed type is 'Bar' ``` In short, when we do `x.key is Key.A`, we "propagate" the information we discovered about `x.key` up to refine the type of `x`. We perform this propagation only when `x` is a Union and only when we are doing member or index lookups into instances, typeddicts, namedtuples, and tuples. For indexing operations, we have one additional limitation: we *must* use a literal expression in order for narrowing to work at all. Using Literal types or Final instances won't work; See #7905 for more details. To put it another way, this adds support for tagged unions, I guess. This more or less resolves #7344. We currently don't have support for narrowing based on string or int literals, but that's a separate issue and should be resolved by #7169 (which I resumed work on earlier this week).
This diff changes how we format Instances with a last known value when displaying them with `reveal_type`. Previously, we would always ignore the `last_known_value` field: ```python x: Final = 3 reveal_type(x) # N: Revealed type is 'builtins.int' ``` Now, we format it like `Literal[3]?`. Note that we use the question mark suffix as a way of distinguishing the type from true Literal types. ```python x: Final = 3 y: Literal[3] = 3 reveal_type(x) # N: Revealed type is 'Literal[3]?' reveal_type(y) # N: Revealed type is 'Literal[3]' ``` While making this change and auditing our tests, I also discovered we were accidentally copying over the `last_known_value` in a few places by accident. For example: ```python from typing_extensions import Final a = [] a.append(1) a.append(2) # Got no error here? reveal_type(a) # Incorrect revealed type: got builtins.list[Literal[1]?] b = [0, None] b.append(1) # Got no error here? reveal_type(b) # Incorrect revealed type: got builtins.list[Union[Literal[0]?, None]] ``` The other code changes I made were largely cosmetic. Similarly, most of the remaining test changes were just due to places where we were doing something like `reveal_type(0)` or `reveal_type(SomeEnum.BLAH)`. This changes should help greatly simplify some of the tests I want to write for python#7169 and also help make a somewhat confusing and implicit part of mypy more visible.
This diff changes how we format Instances with a last known value when displaying them with `reveal_type`. Previously, we would always ignore the `last_known_value` field: ```python x: Final = 3 reveal_type(x) # N: Revealed type is 'builtins.int' ``` Now, we format it like `Literal[3]?`. Note that we use the question mark suffix as a way of distinguishing the type from true Literal types. ```python x: Final = 3 y: Literal[3] = 3 reveal_type(x) # N: Revealed type is 'Literal[3]?' reveal_type(y) # N: Revealed type is 'Literal[3]' ``` While making this change and auditing our tests, I also discovered we were accidentally copying over the `last_known_value` in a few places by accident. For example: ```python from typing_extensions import Final a = [] a.append(1) a.append(2) # Got no error here? reveal_type(a) # Incorrect revealed type: got builtins.list[Literal[1]?] b = [0, None] b.append(1) # Got no error here? reveal_type(b) # Incorrect revealed type: got builtins.list[Union[Literal[0]?, None]] ``` The other code changes I made were largely cosmetic. Similarly, most of the remaining test changes were just due to places where we were doing something like `reveal_type(0)` or `reveal_type(SomeEnum.BLAH)`. The main motivation behind this diff is that once this lands, it should become much simpler for me to write some tests I'll need while revamping #7169. It also helps make a somewhat confusing and implicit part of mypy internals more visible.
This pull request is v2 (well, more like v10...) of my attempts to make our reachability code better understand chained comparisons. Unlike python#7169, this diff focuses exclusively on adding support for chained operation comparisons and deliberately does not attempt to change any of the semantics of how identity and equality operations are performed. Specifically, mypy currently only examines the first two operands within a comparison expression when refining types. That means the following expressions all do not behave as expected: ```python x: MyEnum y: MyEnum if x is y is MyEnum.A: # x and y are not narrowed at all if x is MyEnum.A is y: # Only x is narrowed to Literal[MyEnum.A] ``` This pull request fixes this so we correctly infer the literal type for x and y in both conditionals. Some additional notes: 1. While analyzing our codebase, I found that while comparison expressions involving two or more `is` or `==` operators were somewhat common, there were almost no comparisons involving chains of `!=` or `is not` operators, and no comparisons involving "disjoint chains" -- e.g. expressions like `a == b < c == b` where there are multiple "disjoint" chains of equality comparisons. So, this diff is primarily designed to handle the case where a comparision expression has just one chain of `is` or `==`. For all other cases, I fall back to the more naive strategy of evaluating each comparision individually and and-ing the inferred types together without attempting to propagate any info. 2. I tested this code against one of our internal codebases. This ended up making mypy produce 3 or 4 new errors, but they all seemed legitimate, as far as I can tell. 3. I plan on submitting a follow-up diff that takes advantage of the work done in this diff to complete support for tagged unions using any Literal key, as previously promised. (I tried adding support for tagged unions in this diff, but attempting to simultaneously add support for chained comparisons while overhauling the semantics of `==` proved to be a little too overwhelming for me. So, baby steps.)
Superseded by #8148. |
This pull request is v2 (well, more like v10...) of my attempts to make our reachability code better understand chained comparisons. Unlike #7169, this diff focuses exclusively on adding support for chained operation comparisons and deliberately does not attempt to change any of the semantics of how identity and equality operations are performed. Specifically, mypy currently only examines the first two operands within a comparison expression when refining types. That means the following expressions all do not behave as expected: ```python x: MyEnum y: MyEnum if x is y is MyEnum.A: # x and y are not narrowed at all if x is MyEnum.A is y: # Only x is narrowed to Literal[MyEnum.A] ``` This pull request fixes this so we correctly infer the literal type for x and y in both conditionals. Some additional notes: 1. While analyzing our codebase, I found that while comparison expressions involving two or more `is` or `==` operators were somewhat common, there were almost no comparisons involving chains of `!=` or `is not` operators, and no comparisons involving "disjoint chains" -- e.g. expressions like `a == b < c == b` where there are multiple "disjoint" chains of equality comparisons. So, this diff is primarily designed to handle the case where a comparison expression has just one chain of `is` or `==`. For all other cases, I fall back to the more naive strategy of evaluating each comparison individually and and-ing the inferred types together without attempting to propagate any info. 2. I tested this code against one of our internal codebases. This ended up making mypy produce 3 or 4 new errors, but they all seemed legitimate, as far as I can tell. 3. I plan on submitting a follow-up diff that takes advantage of the work done in this diff to complete support for tagged unions using any Literal key, as previously promised. (I tried adding support for tagged unions in this diff, but attempting to simultaneously add support for chained comparisons while overhauling the semantics of `==` proved to be a little too overwhelming for me. So, baby steps.)
Currently, our reachability code does not understand how to parse comparisons like
a == b == c
: thefind_isinstance_check
method only attempts to analyze comparisons that contain a single==
,is
, orin
operator.This pull request generalizes that logic so we can support any arbitrary number of comparisons.
It also along the way unifies the logic we have for handling
is
and==
checks: the latter check is now just treated a weaker variation of the former. (Expressions containing==
may do arbitrary things if theunderlying operands contain custom
__eq__
methods.)As a side-effect, this PR adds support for the following:
We previously supported this narrowing logic when doing equality checks (e.g. when doing
if x == 'some-string'
).As a second side-effect, this PR adds support for the following:
We previously supported this kind of narrowing only when doing identity checks (e.g.
if y is Foo.A
).Note that to avoid any bad interactions with custom
__eq__
methods, we enable this narrowing check only if both operands do not define custom__eq__
methods.