Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disambiguate booleans on conditions ("and" & "or" operators) #1698

Closed
dmoisset opened this issue Jun 11, 2016 · 11 comments
Closed

disambiguate booleans on conditions ("and" & "or" operators) #1698

dmoisset opened this issue Jun 11, 2016 · 11 comments

Comments

@dmoisset
Copy link
Contributor

I've noticed many false positive type errors related to "and" and "or" operators because they have the tendency to create union types. I'm working with some program which I reduced to the following example:

REGEX = re.compile(r'(a*)(b*)(c*)')

def example(s: str, ignore_matches: bool) -> int:
    match = not ignore_matches and REGEX.search(s)
    if match:
        x = match.group(1) or match.group(3)
        y = match.start(2)
        return len(x) + y

mypy figures out that the type of match is Union[bool, re.Match] (re.Match actually doesn't exist, so I can't workaround this with an isinstance() check here but this is not the main point here). So inside the if, I get:

foo.py:9: error: Some element of union has no attribute "group"
foo.py:10: error: Some element of union has no attribute "start"

I understand why this happens, but I'm thinking that this could be analyzed statically, if mypy actually has some "always True" and "always False" types as subtypes of "Boolean". This could help in "and" and "or" expressions. The type of the expression above would be Union[FalseType, re.Match]. Which can be analyzed in an if removing the "FalseType" option from the Union, which gives an accurate typing of re.Match.

A dual situation can be found if you have boolexp or other_exp which could be typed as Union[TrueType, other_type].

A possible generalization of this is having additional type information for all types saying "can be true-ish", "can be false-ish"; and that type information can be narrowed down on boolean operators and ifs.

On first sight, these ideas may look like going beyond the idea of a type checker and more into a program verifier but that's not actually what I'm proposing; the thing is that in actual programs this kind of checks are actually a way to narrow down union types, and my increase the effectiveness of mypy as a typechecker.

@gvanrossum
Copy link
Member

Mypy does draw conclusions from Boolean operators and conditional statements, but it has limited understanding. Maybe only a slight refinement is needed? Looking at the code it seems that match could have the following values:

  • False, if ignore_matches is True
  • None, if no match is found
  • a Match object, if a match is found

Then the if match: condition should be made to understand that the first bullets cause the block to be skipped, so inside the block it's always a Match.

I'm not sure exactly how to fix this but I think it is in principle possible.

It's possible that there's a complication because the current stubs for re.pyi claim that search() returns a Match -- it should be an Optional[Match]. Once strict optional checking lands that should be fixed; but it would be fine to fix now. I don't think it's the cause of this problem though -- I'm just warning that fixing it (and strict optional checking) may interfere with any fix you could come up with for the issue at hand.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 13, 2016

Having separate, internal types for True and False (subtypes of bool) might be the easiest way to fix the issue. This could come in handy in a few other places as well, at least if these would be promoted to user-level concepts. For example, it might be useful to support overloading based on the value of a boolean argument. I would propose the types to be called just True and False, similar to how the type of None is called None.

@rwbarton
Copy link
Contributor

@pdmccormick suggested the same thing at PyCon in the context of #1477. I like the idea, but we'd need to find a way to continue to infer the types bool or List[bool] for variables initialized to False or [False] (rather than the types False or List[False]), while we'd also need the inferred type of False to be False in order to use functions overloaded on the value of a boolean argument.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 13, 2016

We also discussed the introduction of types like ASCIIUnicode at PyCon. These would be semantically similar to True/False types. ASCIIUnicode would include unicode objects that only have 7-bit characters. I think that the technical term for this concept is 'refinement type'.

I agree that when inferring types of variables, mypy should probably promote all refinement types to the corresponding non-refinement type when there is no explicit type annotation.

@ddfisher
Copy link
Collaborator

I'm not sure we want these to be subtypes of bool, actually, because many other types are also used for their truthiness. Instead, I think we should track the truthiness of all objects. Tracking truthiness universally should be both simpler to implement and more broadly useful.

@dmoisset
Copy link
Contributor Author

@ddfisher I agree on the "broadly useful" aspect, but I'm not sure about the "simpler" to implement. This should require ad-hoc rules for builtins (for example, [] is falsish, list.append always makes a list true-ish). It would help in and and or expressions probably, perhaps the rest of the operations can probably leave truthiness to "unknown"?

@ddfisher
Copy link
Collaborator

Yeah, I agree. I didn't intend to suggest that we should track truthiness in all circumstances. I was thinking more along the lines of your initial example when ignore_matches is something other than a bool.

@dmoisset
Copy link
Contributor Author

OK, I'm starting to work on this, given that it's not a trivial change. Here's an outline of what I'm doing in case you want to suggest a different approach:

  1. I'll define two functions can_be_true(t: Type) -> bool and can_be_false. For most cases they both return True, but
    • can_be_true(Void) == False
    • can_be_true(UninhabitedType) == False
    • can_be_true(empty tuple type) == False
    • can_be_false(non empty tuple) == False
      The only possible type returning False for both functions should be UninhabitedType
  2. I'll add a couple of object decorators (wrappers) TrueOnly(t: Type) -> Type and FalseOnly(t: Type) -> Type. The idea of this is that TrueOnly(t) behaves like t, except that can_be_false(TrueOnly(t)) == False, and symetrically with FalseOnly. Some other details are that:
    • TrueOnly(t) == UninhabitedType when not can_be_true(t)
    • TrueOnly(t) == t when not can_be_false(t)
    • TrueOnly(Union[t1, t2, ...]) == Union[TrueOnly(t1), TrueOnly(t2), ...]
    • and all the symetrical things for FalseOnly
    • also, Union[TrueOnly(t1), FalseOnly(t1), t2, ...] == Union[t1, t2, ...]
    • Note that you may end up with UninhabitedType inside unions, that disappear when simplifying unions because they are in the bottom of the tree.
  3. These wrappers are used to restrict the types of boolean operators. Given p: T and q: U, then
    • if T is UninhabitedType, p and q, p or q and not p are also UninhabitedType
    • if not can_be_true(T) then p and q: T, p or q: U, not p: TrueOnly(bool)
    • if not can_be_false(T) then p and q: U, p or q: T, not p: FalseOnly(bool)
    • otherwise, p and q: Union[FalseOnly(T), U], p or q: Union[TrueOnly(T), U]
  4. with all of these, we can make a small change to the find_isinstance_check (which probably should be renamed at this point). If a condition is a literal of type T, we add TrueOnly(T) to the "if" TypeMap, and FalseOnly(T) to the "else" typemap. The rest of the logic on that function to handle negations and logical operators can stay the same, and if the rules above are follow all the information should propagate.

The following is an example:
In the original code of this issue, match will get the type Union[FalseOnly(bool), re.Match] according to rule 3. Inside the if match, following rule 4, the context will map match to TrueOnly(Union[FalseOnly(bool), re.Match]). Because of 2, that's equivalent to Union[TrueOnly(FalseOnly(bool)), TrueOnly(re.Match)]. The first element of the union is, also using rule 2, Union[UninhabitedType, TrueOnly(re.Match)]. Given that a Union of a type with a subtype absorbs the smaller type, this is equal to Union[TrueOnly(re.Match)], i.e. TrueOnly(re.Match), So inside the if branch, it's valid to call match.group

@ddfisher
Copy link
Collaborator

I see a few issues with this approach. At a high-level, they are:

  • isinstance is used widely inside mypy. How are you planning on dealing with that in the TrueOnly and FalseOnly wrappers?
  • A number of your equivalences in 2) seem off. For example, NoneType can never be true, and Unions with mixed TrueOnlys and FalseOnlys shouldn't have the TrueOnlys and FalseOnlys discarded.

I'd consider adding an Optional[bool] to Instance instead of the wrapper objects. Aside from what I said above, I think you have the right general idea.

@dmoisset
Copy link
Contributor Author

dmoisset commented Aug 5, 2016

I pushed a PR for this at #1989 It handles this case at least. I'll take a look to related issues to see if they improved with this

@gnprice gnprice added this to the 0.4.x milestone Aug 11, 2016
ddfisher pushed a commit that referenced this issue Aug 16, 2016
This is an implementation of my proposed fix for #1698 ( #1698 (comment) )

Relevant differences from the description there:

After the suggestion from @ddfisher I used class attributes in Type instead of wrapping Type objects. When I want to restrict a type I make a copy and override the can_be_true or can_be_false with instance attributes
I didn't implement the rules for not after realising that I couldn't find a relevant example where it helped. It's easy to add later if we discover otherwise but I didn't want to bloat an already large PR
I noticed that the deductive power was good enough to start detecting more dead code in unrelated test cases (see modifications in check-statements, check-modules and check-generics).
There are some other improvements that I saw that could be made but I decided to stop here; if you think any of these are necessary I can add them:

Covering the "not" operator, as mentioned above
Renaming "find_isinstance_check" to something like "types_specialized_by_condition" which is more descriptive of its current purpose.
I used copy.copy() followed by an update to create restricted version of types, perhaps there's a better/safer way to copy those (and even cache the restricted versions to reduce memory usage)
@gvanrossum
Copy link
Member

gvanrossum commented Aug 17, 2016

Closing since #1989 fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants