Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment of Any does not get rid of Optional (add AnyUnion type?) #3526

Closed
ddfisher opened this issue Jun 12, 2017 · 29 comments
Closed

Assignment of Any does not get rid of Optional (add AnyUnion type?) #3526

ddfisher opened this issue Jun 12, 2017 · 29 comments
Assignees

Comments

@ddfisher
Copy link
Collaborator

Consider this code (run with --strict-optional):

from typing import Optional
def f():
  # untyped function; often from silent import
  return 0  # always returns an int

def g(x: Optional[int]) -> int
  if x is None:
    x = f()
  return x  # E: Incompatible return value type (got "Optional[int]", expected "int")

Mypy doesn't infer that x is not None in g. I think this a bug -- mypy
should be permissive here.

If we wanted to fix this, we'd run into a problem: what type should x have
after x = f()? We don't want it to be Any, because we don't want that
behavior in the general case. It shouldn't be int -- f() could be intended
to unconditionally return None instead. To fix this, I think we need to
introduce a new special type, which I will call AnyUnion. AnyUnion[A, B, C] would represent a type which is allowed to be used as any of A, B, or
C (similarly to how Any is allowed to be used as any type). In this case,
we could have x's type bound to AnyUnion[int, None], which would fix this
error.

Other helpful uses of AnyUnion:

  • pow will no longer need to return Any in builtins. Instead, it could
    return the much less broad AnyUnion[int, float].
  • AnyUnion[X, None] would provide a per-variable opt-out of strict None
    checking. This could be helpful for classes with delayed initialization where
    a lot of effort would be needed to refactor them into a state that would work
    properly with strict None checking.

Cons:

  • Additional complexity of the type system affects implementation complexity and
    makes things slightly harder for users to understand.
  • The motivating problem also exists with subclasses (with the way the binder
    currently works), but AnyUnion doesn't help much with that.
@ilevkivskyi
Copy link
Member

pow will no longer need to return Any in builtins. Instead, it could
return the much less broad AnyUnion[int, float].

But doesn't #3501 (or similar) provides a better solution?

AnyUnion[X, None] would provide a per-variable opt-out of strict None
checking. This could be helpful for classes with delayed initialization where
a lot of effort would be needed to refactor them into a state that would work
properly with strict None checking.

Do I understand correctly that this has been solved by PEP 526?

In general, I don't think that we really need this feature. What worries me is that in addition to making type system more complex it also can break transitivity of subtyping (or maybe better call this compatibility). IIUC your proposal, then int <: AnyUnion[int, str] <: str. Currently, we have only one special type that does this -- Any. Having many (user defined) types that break transitivity might lead to unexpected consequences.

@ilevkivskyi
Copy link
Member

Do I understand correctly that this has been solved by PEP 526?

Probably it was only partially fixed by PEP 526. But for remaining cases one can just do something like:

UNINITIALIZED: Any = None

class C:
    def __init__(self) -> None:
        self.complicated: int = UNINITIALIZED
        # existing code below
        while self.complicated is None:
            ...

@gvanrossum
Copy link
Member

I'm not sure this is needed all that often, but if we do, we could just spell it Any -- Any[A, B, C] is acceptable as a A, B or C, while plain Any is unchanged. Alternatively, maybe this type is really the same as Intersection (python/typing#213)?

Regardless I'm not sure that in your original motivating example we should do anything different. If you want to signal at the call site that f() returns an int, you can write

def g(x: Optional[int]) -> int
  if x is None:
    y = f()  # type: int
    x = y
  return x  # OK

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 13, 2017

This is probably the same feature as 'unsafe unions' that have been discussed before (though I can't find the discussion -- maybe it was in person or over email). Unsafe unions are different from Intersection. For example, unlike intersections, compatibility goes both ways: int is compatible with AnyUnion[int, str] and vice versa.

My take is that this feature would complicate mypy significantly. Normal union types are already a very complicated feature, and they still aren't even fully implemented. It would be a likely cause of some user confusion, since we'd have two pretty similar but different concepts -- union types and these any/unsafe unions. If we decide to add intersection types at some point, there would be even more potential for confusion, since they are also similar in some ways to any unions. All in all, I don't think that there is enough evidence of the usefulness of the feature to support implementing it.

@JukkaL JukkaL changed the title Add AnyUnion type Assignment of Any does not get rid of Optional (add AnyUnion type?) Jun 23, 2017
@JukkaL
Copy link
Collaborator

JukkaL commented Jun 23, 2017

Solving the original issue is a priority because this may come up often when using ignored imports. I'm not convinced that adding AnyUnion is the right course of action, though.

@ilevkivskyi
Copy link
Member

Concerning the original example:

from typing import Optional
def f():
  # untyped function; often from silent import
  return 0  # always returns an int

def g(x: Optional[int]) -> int
  if x is None:
    x = f()
  return x

what is wrong with x having type Union[Any, int] after the if block? Thus seems logical, initially it was Union[None, int] but then the None is replaced by Any in the if block by x = f(). Fortunately, such unions are not simplified anymore:

x: Union[int, Any]
reveal_type(x) # Revealed type is 'Union[builtins.int, Any]'
x.whatever # Error: Item "int" of "Union[int, Any]" has no attribute "whatever"

Also, Union[int, Any] is a (non-proper) subtype of int so that there should be no error in the original example.

@ddfisher
Copy link
Collaborator Author

@ilevkivskyi: That doesn't work, as variables don't change type in general. In particular, we want to maintain this behavior:

x = 0  # type: int
x = any()
reveal_type(x)  # int

@ilevkivskyi
Copy link
Member

@ddfisher

as variables don't change type in general

But there is the type binder that can already do this:

x: Optional[int]

if x is None:
    x = int()

reveal_type(x)  # Revealed type is 'builtins.int'

Why can't it be special-cased to re-bind to Any inside unions? (I agree that binding to Any in general is clearly bad.)

@ddfisher
Copy link
Collaborator Author

Unless there's a really good reason, Unions should work as similarly as possible to non-Unions. I don't think this is a strong enough reason.

The binder only binds to subtypes of the declared type. If we change that, the declared type starts to become pretty meaningless.

@gvanrossum
Copy link
Member

gvanrossum commented Jun 23, 2017 via email

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 28, 2017

Always rebinding a variable to Any is not such an outlandish idea. Here is an example:

def f(x: Iterable[int]) -> None:
    x = make_sparse_vector(x)
    x[0] = 1  # not valid for Iterable but okay for vector

def make_sparse_vector(iterable): 
    ...  # returns a vector object

This currently generates a false positive error. I'd argue that it would be more correct if the example did not generate errors. The type of x after the assignment would be Any. This would apply to the original example as well, as proposed by @ilevkivskyi above, but we wouldn't special case unions:

def g(x: Optional[int]) -> int
    if x is None:
        x = f()
        # type of x is Any here
    # type of x is Union[int, Any] here
    return x  # okay

Yes, this change would generate some additional false negatives. It would also fix false positives and be consistent with the philosophy of gradual typing. For gradual typing to work seamlessly, changing a non-Any type to Any should not generate errors -- i.e., it should be possible to annotate a codebase in an arbitrary order, and it should be possible to ignore arbitrary imports.

@gvanrossum
Copy link
Member

gvanrossum commented Jun 28, 2017

This seems very attractive. I have one concern though.

Suppose I have a file-ish object whose readline() method [update:] is declared as Any but at runtime returns either str or bytes depending on some flag setting. Suppose I know the flag setting (but mypy doesn't) and I know it will always return bytes. Can I still write this?

line = b''  # say it's previously used
line = f.readline()
a = line.decode('utf8')  # will infer str

Or will line become Any, causing a also to become Any?

@ilevkivskyi
Copy link
Member

On a second thought, I also think it is not such a terrible idea to rebind to Any. The point is that we already have several flags to "control" Any (and probably will have few more). So that the possible false negatives will not be completely unnoticed.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 28, 2017

Or will line become Any, causing a also to become Any?

Yes. You'd have to use something like line = cast(bytes, f.readline()) or line2 = f.readline() # type: bytes instead to get more typing precision. Or you can annotate/cast f to be IO[bytes]. And I agree with @ilevkivskyi that the options for controlling Any types (and type precision reports) would be helpful.

Once we can easily calculate the typing precision of a program, it would be easy enough to run an experiment to determine the cost in type checking precision this change would cause for any particular real-world codebase.

@gvanrossum
Copy link
Member

Hm, I really don't like the cast. I've recommended using assignment to a variable combined with a type annotation to avoid casts many times. My reasoning is that if in the future the type of readline() becomes more precise, and there is in fact a bug in how the program uses its return value, using cast() will hide the bug, while using an annotated variable will reveal it.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 28, 2017

Agreed that we shouldn't recommend the cast. Using a temporary variable would be my favorite approach if it's impractical to add an annotation for f:

line = b''  # say it's previously used
next_line = f.readline()  # type: bytes
line = next_line
a = line.decode('utf8')  # will infer str

@gvanrossum
Copy link
Member

OK, so maybe that would then run up to #2008...? I guess we should double-check that the solution chosen there doesn't invalidate this use case.

@ddfisher
Copy link
Collaborator Author

Proposal 3 from #2008 would prevent this solution (or would get very special-casey).

I'm not a fan of always rebinding a variable to Any. I think it would cause the number of Anys present in codebases (existing and future) to expand dramatically. From what I've seen in internal code, people commonly don't know when functions they call return Any (generally due to unfollowed imports), so they wouldn't know to use the temporary variable solution recommended above (and it's almost certainly not something done currently). This should be testable with --any-exprs-report and a prototype implementation.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 29, 2017

I think it would cause the number of Anys present in codebases (existing and future) to expand dramatically.

I don't expect this to be the case, as in the steady state codebases are expected to have a high annotation coverage, and for partially annotated code this would only affect a relatively small fraction of assignments. This would have some temporary effect during the migration from originally unannotated code, and even then there would be a trade-off: this would simplify migration by generating fewer false positives at the cost of less precise type checking. Anyway, we can run experiments to estimate the impact, we don't just have to guess.

And yes, this would conflict with proposal 3 in #2008 -- it would need tweaks. On the other hand, the AnyUnion proposal would not help with the vector example from above:

def f(x: Iterable[int]) -> None:
    x = make_sparse_vector(x)
    # type would still be Iterable[int]
    x[0] = 1  # ouch

def make_sparse_vector(iterable): 
    ...  # returns a vector object

We also have the option of using a quick hack that would work around the original issue: special case assignment of Any to a variable with bound type None. We could infer a union where None is replaced with Any:

from typing import Optional

def f():...

def g(x: Optional[int]) -> int
    if x is None:
        x = f()
        reveal_type(x)   # Union[int, Any]
    reveal_type(x)  # Union[int, Any]
    return x  # Ok

Implementing AnyUnion is too much effort and extra complexity for what I'd expect to be a minor gain in type checking coverage. So let's leave this issue undecided for now, or at most implement a simple workaround such as the one I described. We can get back to this once we have more data, such as more users hitting this issue, or an analysis of the impact of allowing rebinding with Any type.

@ilevkivskyi
Copy link
Member

And yes, this would conflict with proposal 3 in #2008 -- it would need tweaks.

The option number 3 already mentions

Unclear interactions with Any types.

as a downside. Maybe I am missing something, but this could be a minor tweak. Something like: types are always bound on initial assignments unless r.h.s. is Any.

@ilevkivskyi
Copy link
Member

@JukkaL

So let's leave this issue undecided for now, or at most implement a simple workaround such as the one I described. We can get back to this once we have more data, such as more users hitting this issue

It looks like #3724 is another example where people want to rebind to Any (also inside a union with None), I think it would be nice to implement the "simple workaround", since it seems that it will fix most real use cases.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 17, 2017

Yeah, it looks like this is a somewhat frequent issue. False positives like this are quite inconvenient for users. The simple workaround modified so that it also fixes #3724 sounds like the best proposal we have for now (in terms of benefit vs cost).

@ddfisher
Copy link
Collaborator Author

We should check how many more Anys the simple proposal would introduce into real world codebases before merging it.

@ddfisher
Copy link
Collaborator Author

And yes, this would conflict with proposal 3 in #2008 -- it would need tweaks.

The option number 3 already mentions

Unclear interactions with Any types.

as a downside. Maybe I am missing something, but this could be a minor tweak. Something like: types are always bound on initial assignments unless r.h.s. is Any.

It's not that simple. Take this circumstance, for example:

x = [1, 2, 3]  # type: Sequence[Any]

We don't want to bind List[int] for x, because we want to preserve Anys (as otherwise Anys in generics would be useless). We don't want to leave it as Sequence[Any], because that would make it behave inconsistently from all other list assignments. We'd have to bind List[Any] as the type for x. This isn't necessarily terrible, but is at a minimum a little tricky.

@ilevkivskyi
Copy link
Member

We'd have to bind List[Any] as the type for x. This isn't necessarily terrible, but is at a minimum a little tricky.

I think bind List[Any] in this case is a reasonable thing to do. Indeed, general interactions with Any may be complex. However, I would still propose to go with Jukka's simple solution (if it doesn't explode the number of Anys) and figure out the detailed rules about Any later, when we get to #2008.

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 18, 2017

#3431 is also similar.

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 5, 2017

This seems to happen frequently enough that I'm bumping priority to "high". Also removing the "needs discussion" label -- let's special case None for now, as discussed above.

@gvanrossum
Copy link
Member

I'm not 100% sure what the proposal is. From "discussed above" link it seems that we should make this work?

def f():...
def g(x: Optional[int]) -> int:
    if x is None:
        x = f()
        reveal_type(x)   # Union[int, Any]  # <---- this is new (currently Optional[int])
    reveal_type(x)  # Union[int, Any]  # Currently also Optional[int]
    return x  # Ok  # Currently an error (Incompatible return value type ...)

But how? And should the fix take into account that f() is unannotated, or should it work for anything of type Any? Let's assume the latter. At the point x = f() we do something special if certain conditions are met. What are those conditions? I'm guessing all of the following:

  • the target's type is known to be None due to a None check recorded in the binder overriding some Optional[t]
  • the value's type is Any

The action is then apparently to set the target's type to Union[t, Any]. The rest will follow.

Have I got that?

@ilevkivskyi
Copy link
Member

@gvanrossum

But how?

I think not much have to be done, the binder would do the things, but it is explicitly prohibited to do many things with AnyType, for example in ConditionalTypeBinder.assign_type:

# Assigning an Any value doesn't affect the type to avoid false negatives

I think we can just add an exception there, so that when declared_type is a union containing None and enclosing_type is None, we still put the type even if it is an AnyType.

@JukkaL JukkaL added the false-positive mypy gave an error on correct code label Aug 14, 2018
@msullivan msullivan self-assigned this Sep 17, 2018
msullivan added a commit that referenced this issue Sep 17, 2018
As a special case, when assigning Any to a variable with a
declared Optional type that has been narrowed to None,
replace all the Nones in the declared Union type with Any.

The result of this is:
```
def f():...
def g(x: Optional[int]) -> int:
    if x is None:
        x = f()
        reveal_type(x)   # Union[int, Any]  # <---- this is new (was Optional[int])
    reveal_type(x)  # Union[int, Any]  # was also Optional[int]
    return x  # Ok  # was an error
```

Fixes #3526.
ilevkivskyi pushed a commit that referenced this issue Sep 18, 2018
As a special case, when assigning Any to a variable with a
declared Optional type that has been narrowed to None,
replace all the Nones in the declared Union type with Any.

The result of this is:
```
def f():...
def g(x: Optional[int]) -> int:
    if x is None:
        x = f()
        reveal_type(x)   # Union[int, Any]  # <---- this is new (was Optional[int])
    reveal_type(x)  # Union[int, Any]  # was also Optional[int]
    return x  # Ok  # was an error
```

Fixes #3526.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants