Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for narrowing Literals using equality #8151

Conversation

Michael0x2a
Copy link
Collaborator

This pull request (finally) adds support for narrowing expressions using Literal types by equality, instead of just identity. For example, the following "tagged union" pattern is now supported:

class Foo(TypedDict):
    key: Literal["A"]
    blah: int

class Bar(TypedDict):
    key: Literal["B"]
    something: str

x: Union[Foo, Bar]
if x.key == "A":
    reveal_type(x)  # Revealed type is 'Foo'
else:
    reveal_type(x)  # Revealed type is 'Bar'

Previously, this was possible to do only with Enum Literals and the is operator, which is perhaps not very intuitive.

The main limitation with this pull request is that it'll perform narrowing only if either the LHS or RHS contains an explicit Literal type somewhere. If this limitation is not present, we end up breaking a decent amount of real-world code -- mostly tests -- that do something like this:

def some_test_case() -> None:
    worker = Worker()

    # Without the limitation, we narrow 'worker.state' to
    # Literal['ready'] in this assert...
    assert worker.state == 'ready'

    worker.start()

    # ...which subsequently causes this second assert to narrow
    # worker.state to <uninhabited>, causing the last line to be
    # unreachable.
    assert worker.state == 'running'
    worker.query()

I tried for several weeks to find a more intelligent way around this problem, but everything I tried ended up being either insufficient or super-hacky, so I gave up and went for this brute-force solution.

The other main limitation is that we perform narrowing only if both the LHS and RHS do not define custom __eq__ or __ne__ methods, but this seems like a more reasonable one to me.

Resolves #7944.

@Michael0x2a
Copy link
Collaborator Author

Also -- this PR was written on top of #8148. I can rebase this once the other PR lands.

I also ran this on one of our internal codebases and confirmed there were no new errors, with the exception of one or two (legitimate) "unnecessary cast" errors.

Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Dec 16, 2019
This pull request is a long-overdue update of the Literal type docs. It:

1. Removes the "this is alpha" warning we have at the top.

2. Mentions Literal enums are a thing (and works in a brief example of one).

3. Adds a section about "intelligent indexing".

4. Adds a section about the "tagged union" pattern (see
   python#8151). I made the example
   focus on the TypedDict/JSON use case -- IMO that's really the only
   realistically useful use case for the pattern.

5. Cross-references the "tagged union" docs with the TypedDicts docs --
   IMO, tagged unions are mostly useful when you're working with JSON

I also thought about making the "Unions of TypedDict" section I added
to the TypedDicts doc mention using [pydantic][0] as an alternative
to the "tagged union" pattern.

I personally prefer using libraries like this which handle validation
and let me use regular classes (and `isinstance`) instead of dicts, but I
wasn't sure whether we want to be recommending 3rd party libraries so
held off for now.

  [0]: https://pydantic-docs.helpmanual.io
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Dec 16, 2019
This pull request is a long-overdue update of the Literal type docs. It:

1. Removes the "this is alpha" warning we have at the top.

2. Mentions Literal enums are a thing (and works in a brief example of one).

3. Adds a section about "intelligent indexing".

4. Adds a section about the "tagged union" pattern (see
   python#8151). I made the example
   focus on the TypedDict/JSON use case -- IMO that's really the only
   realistically useful use case for the pattern.

5. Cross-references the "tagged union" docs with the TypedDicts docs --
   IMO, tagged unions are mostly useful when you're working with JSON

I also thought about making the "Unions of TypedDict" section I added
to the TypedDicts doc mention using [pydantic][0] as an alternative
to the "tagged union" pattern.

I personally prefer using libraries like this which handle validation
and let me use regular classes (and `isinstance`) instead of dicts, but I
wasn't sure whether we want to be recommending 3rd party libraries so
held off for now.

  [0]: https://pydantic-docs.helpmanual.io
@fluggo
Copy link

fluggo commented Dec 16, 2019

I'm glad to see this landing.

My comment on your second type-narrowing sample, where checking the state limits the operations available, I would argue that the Worker type shouldn't be written to narrow its attributes based on the state. The query() method almost certainly doesn't cease to exist when worker.state == 'ready'. I feel like that's abusing the type system; type narrowing is there to clarify the type of worker as of its last assignment, and not to try to keep up with how the type might change itself as time goes by.

TypeScript doesn't support that use case, and I don't feel like you should have to, either.

@Michael0x2a
Copy link
Collaborator Author

Michael0x2a commented Dec 16, 2019

@fluggo -- ah, sorry. To clarify, the last example isn't attempting to narrow the type of worker in any way, just worker.state.

Basically, the core problem is that just blindly applying the way mypy normally narrows variables ends up causing too many issues. For example, here's how mypy currently narrows the following program:

class A: pass
class B: pass

def test(x: object) -> None:
    assert isinstance(x, A)
    reveal_type(x)  # Revealed type is 'A'

    assert isinstance(x, B)
    reveal_type(x)  # No output: stmt is unreachable; mypy thinks the assert is always false

In this case, the behavior is mostly reasonable. While 'x' could theoretically be a type that multiply inherits from both A and B, it usually isn't, so concluding that the final assert will always be False and the last line to be unreachable is a reasonable heuristic. (And when/if mypy starts understanding the notion of intersection types, it could maybe infer x is an intersection of A and B, but whatever, that's a separate issue.)

So, since Literal types are supposed to be treated more or less as regular subtypes of the underlying type, it might initially make sense to do the same thing there:

def test(x: int) -> None:
    assert x == 10
    reveal_type(x)  # Suppose we narrowed so the revealed type is Literal[10]

    assert x == 20  # Then, mypy would conclude x can't be both 10 and 20
    reveal_type(x)  # ...and since the assert is always false, this is unreachable

Attempting to narrow is reasonable in this specific case, since the last assert genuinely will always be false, since there's no way for the value of 'x' to abruptly change in after the first assert.

But the expression we're narrowing can abruptly change if it's, say, some object attribute instead of just a variable. The 'worker' example I had was just trying to show a realistic example of where this happens: a method changes some internal field, and we check the new value is what we expected in some test case.

This "what if a method changes an attribute and causes our narrowing to be bogus" problem also appears when narrowing regular types with isinstance(...) as well. It's just that it's (probably) pretty uncommon for real-world objects to have some field of type Union[A, B] and keep dynamically switching from one to another.

That is, while our current heuristics for type-level narrowing are sometimes mildly problematic, attempting apply them for value-level narrowing ends up being actually problematic.

So we need to adjust how we do value-level narrowing in some way (disable them for asserts? change the behavior of asserts so they override instead of narrowing? do the narrowing only if the attribute is Final? infer a pseudo instance/Literal hybrid? give up and tell users to use enums if they want this sort of narrowing?), and I ended up going with the "require the LHS or RHS to explicitly use Literals" approach.

And in practice, this ends up meaning that the thing you're narrowing needs to be a Union of Literals. (This conveniently happens to be just what you need if you want tagged unions w/o having to use enum everywhere.)

That said, I'm not really sure whether this is good enough either. Sure, it'll ensure that the 'worker' example doesn't end up with any unreachable statements if worker.state was originally defined to be type str. But if it were defined to be type Literal["ready", "running", "paused", "done"] or something, we'd end up right back where we started with the last assert being inferred to be always False.

¯\_(ツ)_/¯

@fluggo
Copy link

fluggo commented Dec 20, 2019

I see what you're saying now. I ran the same sample through TypeScript, and I was surprised to find that the same example fails there:

class Worker {
  state: 'ready' | 'running';

  start() {
    this.state = 'running';
  }

  query() {
  }
}

function someTestCase(): void {
  const worker = new Worker();

  if(worker.state !== 'ready')
    throw new Error('');

  worker.start();

  if(worker.state !== 'running')   // Error: This condition will always return 'true' since the types '"ready"' and '"running"' have no overlap. ts(2367)
    throw new Error('');

  worker.query();
}

It seems the solution in TypeScript is exactly as you proposed: don't define a "status"-like field as type Literal. TypeScript itself follows this principle for XmlHttpRequest, which defines the readyState as type number instead of the literals it could be. TypeScript only uses literals in static data types or for union discrimination.

@ilevkivskyi
Copy link
Member

I will review this right after #8148 is merged (currently it is a bit too large to review, sorry)

This pull request (finally) adds support for narrowing expressions
using Literal types by equality, instead of just identity. For example,
the following "tagged union" pattern is now supported:

```python
class Foo(TypedDict):
    key: Literal["A"]
    blah: int

class Bar(TypedDict):
    key: Literal["B"]
    something: str

x: Union[Foo, Bar]
if x.key == "A":
    reveal_type(x)  # Revealed type is 'Foo'
else:
    reveal_type(x)  # Revealed type is 'Bar'
```

Previously, this was possible to do only with Enum Literals and the
`is` operator, which is perhaps not very intuitive.

The main limitation with this pull request is that it'll perform narrowing
only if either the LHS or RHS contains an explicit Literal type
somewhere. If this limitation is not present, we end up breaking a decent
amount of real-world code -- mostly tests -- that do something like this:

```python
def some_test_case() -> None:
    worker = Worker()

    # Without the limitation, we narrow 'worker.state' to
    # Literal['ready'] in this assert...
    assert worker.state == 'ready'

    worker.start()

    # ...which subsequently causes this second assert to narrow
    # worker.state to <uninhabited>, causing the last line to be
    # unreachable.
    assert worker.state == 'running'
    worker.query()
```

I tried for several weeks to find a more intelligent way around this
problem, but everything I tried ended up being either insufficient or
super-hacky, so I gave up and went for this brute-force solution.

The other main limitation is that we perform narrowing only if both the
LHS and RHS do not define custom `__eq__` or `__ne__` methods, but this
seems like a more reasonable one to me.

Resolves python#7944.
@Michael0x2a Michael0x2a force-pushed the reachability-unify-equality-and-identity branch from 77622f9 to f82a019 Compare December 25, 2019 04:49
@Michael0x2a
Copy link
Collaborator Author

@ilevkivskyi -- fyi, I finished rebasing this PR

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The general idea looks right to me, I only have a bunch of minor comments.

mypy/checker.py Outdated Show resolved Hide resolved
mypy/checker.py Outdated Show resolved Hide resolved
mypy/checker.py Outdated Show resolved Hide resolved
mypy/checker.py Outdated Show resolved Hide resolved
mypy/checker.py Outdated


# TODO: why can't we define this as an inline function?
# Does mypyc not support them?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean lambda by "inline function"? What exactly goes wrong when you try?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I actually don't remember. I'll try experimenting with this a little later today and see if I can repro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, I don't seem to be able to repro this anymore? Maybe it was fixed after I rebased or maybe I just doing something wrong before, but defining an inline function seems to be working fine now.

Defining a lambda unfortunately causes flake8 to complain -- it doesn't like it when you assign a lambda to a variable.

Anyways, I moved this function back to where it's being used.

@@ -655,7 +674,7 @@ def coerce_to_literal(typ: Type) -> ProperType:
enum_values = get_enum_values(typ)
if len(enum_values) == 1:
return LiteralType(value=enum_values[0], fallback=typ)
return typ
return original_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is right, thanks!

reveal_type(x5) # N: Revealed type is 'TypedDict('__main__.TypedDict1', {'key': Literal['A'], 'foo': builtins.int})'
else:
reveal_type(x5) # N: Revealed type is 'TypedDict('__main__.TypedDict2', {'key': Literal['B'], 'foo': builtins.str})'
[builtins fixtures/primitives.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add couple more tests with typed dicts (for example nested typed dicts), since this is pretty important use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention this earlier, but I added a few more tests, mostly copying some of the existing ones I had for enums and objects. LMK if there are more test cases you'd like to see.

# is narrowed here -- see 'testNarrowingEqualityFlipFlop' for an example of
# why more precise inference here is problematic.
x_str: str
if x_str == A_final:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why literal type is "better" than a final name. IMO final names should behave exactly as if they were replaced by their values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's any consolation, we also won't narrow anything away if we do if x_str == "A" -- I updated the test case with an example.

Basically, I found trying to narrow when we do x_str == "A" often ended up causing too many false positives, which is why I settled on the heuristic of requiring either the LHS or the RHS to be a Literal[...] of some sort before we attempt narrowing.

reveal_type(z)
else:
# TODO: Why do we narrow away 'Literal[1]' here?
# Even if the equality comparison is bogus, we should try and do better here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit worrying, so I think this deserves a follow-up issue.

Copy link
Collaborator Author

@Michael0x2a Michael0x2a Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a bit embarrassing. I did some more digging and it turned out that the whole concept of "partial TypeMaps" I introduced in my earlier diff was incorrect and that several of the narrowing test cases I added were too optimistic.

Basically, I made mypy narrow the else case in programs like:

x: Literal[1, 2]
y: Literal[1, 2]
if x == 2 == y:
    reveal_type(x)  # Revealed type is Literal[2]
    reveal_type(y)  # Revealed type is Literal[2]
else:
    reveal_type(x)  # Revealed type is incorrectly Literal[1]!
    reveal_type(y)  # Revealed type is incorrectly Literal[1]!

But actually, we could enter the else case if x == 2 and y == 1 or vice versa, so narrowing was incorrect.

Going back to using just or_conditional_maps instead of or_partial_conditional_maps ended up fixing these bogus narrowings. I also added a few more test cases + fixed some earlier enum ones.

@Michael0x2a
Copy link
Collaborator Author

This should be ready for a second look whenever fyi.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates! Looks good now, just one comment.

reveal_type(x) # N: Revealed type is 'Literal[__main__.Foo.B]'
reveal_type(y) # N: Revealed type is 'Literal[__main__.Foo.B]'
reveal_type(x) # N: Revealed type is '__main__.Foo'
reveal_type(y) # N: Revealed type is '__main__.Foo'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing before comments is kind of inconsistent here and below.

@Michael0x2a Michael0x2a merged commit 3dce3fd into python:master Jan 8, 2020
@Michael0x2a Michael0x2a deleted the reachability-unify-equality-and-identity branch January 8, 2020 16:20
Michael0x2a added a commit that referenced this pull request Jan 8, 2020
This pull request is a long-overdue update of the Literal type docs. It:

1. Removes the "this is alpha" warning we have at the top.

2. Mentions Literal enums are a thing (and works in a very brief example of one).

3. Adds a section about "intelligent indexing".

4. Adds a section with an example about the "tagged union" pattern
    (see #8151).

5. Cross-references the "tagged union" docs with the TypedDicts docs.
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 9, 2020
In theory, this draft commit fixes python#8236

However, it looks like the ComparisonExpr part of
find_isinstance_check_helper was rewritten today morning in python#8151.
It looks like it's not super easy to merge: `is_valid_target` gets in the
way and I'm a lot less sure about the change.

I'm also not fully sure about the implications of making
AssignmentExpr into mypy.literals, but seems like it's what we'd want.

I also don't really like the way I've branched for AssignmentExpr, it's
pretty unsatisfactory. Well, work in progress!
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 9, 2020
In theory, this draft commit fixes python#8236

However, it looks like the ComparisonExpr part of
find_isinstance_check_helper was rewritten today morning in python#8151.
It looks like it's not super easy to merge: `is_valid_target` gets in the
way and I'm a lot less sure about the change.

I'm also not fully sure about the implications of making
AssignmentExpr into mypy.literals, but seems like it's what we'd want.

I also don't really like the way I've branched for AssignmentExpr, it's
pretty unsatisfactory. Well, work in progress!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Narrowing unions with Literal and TypedDict
3 participants