Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type constraints: dict.get doesn't accept Union type as key #1472

Closed
ddfisher opened this issue May 3, 2016 · 9 comments
Closed

Type constraints: dict.get doesn't accept Union type as key #1472

ddfisher opened this issue May 3, 2016 · 9 comments

Comments

@ddfisher
Copy link
Collaborator

ddfisher commented May 3, 2016

Passing an overly-general Union to dict.get, like this:

from typing import Union

x = 5  # type: Union[int, str]
{ 1: 1 }.get(x, 0)

results in this error:

test.py:4: error: Argument 1 to "get" of "dict" has incompatible type "Union[int, str]"; expected "int"

Arguably, this should be allowed, because get provides a default if the item is not found (unlike __getitem__). But by that logic, potentially any type should be a valid argument to get, which doesn't seem right.
@gvanrossum: further thoughts?

@gvanrossum
Copy link
Member

The reasonable middle ground I hope for would require there's some overlap (like between int and Union[int, str] in the example). @JukkaL: I don't know if that's a reasonable thing to expect mypy to implement? I don't think this can be solved with just a typeshed update.

@rwbarton
Copy link
Contributor

rwbarton commented May 5, 2016

I've also encountered exactly the same case, and there are many related situations: del x[i], i in x, y.remove(i) for y a list. In all these situations type safety is not actually threatened, but it's probably a mistake if the type of i is totally unrelated to the key type of x (or element type of y). set.intersection is another example. For get mypy currently requires that the types match, while for set.intersection it doesn't; I'm not sure how it treats the other cases I listed.

Arguably the theme underlying all these cases is whether it makes sense to compare i for equality with the keys or elements of the collection. (If you wanted to implement your own versions of these collections, you would at least need to be able to do such equality tests.) And == has the same issue; see #1271. So I think this issue is closely related to that one.

One interesting experiment would be to see whether the more specific type for dict.get is needed in practice for type inference reasons. I suspect it isn't really needed, and in that case, we have more options. From a technical point of view I think it would be best to have the ability to specify constraints on the parameters of generic functions, like (strawman syntax; I imagine this exact syntax isn't feasible)

class dict(MutableMapping[_KT, _VT], Generic[_KT, _VT]):
    def get(self, k: K, default: _VT = None) -> _VT if Equatable[_KT, K]: ...

The idea would be that these constraints do not affect inference, but are simply checked just before trying to instantiate a generic function at a call site. Type variable value restrictions and upper bounds could also potentially be recast in this form (though the existing syntax would still be more convenient in many cases).

@ddfisher
Copy link
Collaborator Author

ddfisher commented May 5, 2016

That looks an awful lot like type classes...

🎉

@rwbarton
Copy link
Contributor

rwbarton commented May 9, 2016

That's not entirely a coincidence, and may reflect my biases more than anything else. It's really just the constraints part though, and not type classes themselves (class constraints are one possible form of constraint, but there are others such as type equality or subtyping constraints). I'm not proposing to allow the user to define their own type classes or instances.

We could also reverse my last comment, and turn the constraint into an attribute of one of the type variables involved, which fits better with existing mechanisms.

_KTe = TypeVar('_KTe', equatable=_KT)
class dict(MutableMapping[_KT, _VT], Generic[_KT, _VT]):
    def get(self, k: _KTe, default: _VT = None) -> _VT: ...

The intended meaning of equatable is: _KTe can only take on values that are "equatable" with _KT (which is expected to be in scope at use site of _KTe). "Equatable" could mean anything we like, but let's say it means that the types _KT and _KTe overlap. This is similar to bound, although for this use case we need to allow the argument to be another type variable, which bound does not support.

@JukkaL
Copy link
Collaborator

JukkaL commented May 9, 2016

I've thought about supporting constraints for (generic) functions that live outside the type signature proper, though the idea of "equatable" is new to me. It sounds like a good idea.

I can see us having two kinds of constraints -- type-based constraints and value-based constraints. The latter would perhaps only work for argument values that are literals or constants.

Here are some examples where type based constraints could come in useful, in addition to the examples given by Reid above:

  • Subtyping constraints could be useful for sort() and __gt__ (etc.) methods of list. They could have an additional constraint that the list type variable T is a subtype of Comparable. Currently there's no way to represent this. For most methods comparability is not necessary, so we can't have it as a bound of the type variable.
  • We could also have a general "supports operation" constraint for all binary (and unary) operations, instead of just for equality. For example, SupportsAdd[T, S, X] would be satisfied if t + s (where t has type T and s has type S) has type X. The sum builtin could use this. sort() of list could have a constraint such as SupportsLt[T, T, object]. Structural subtyping would not be good enough for this because of various special properties of binary operations, such as reverse methods.

Here are potential examples of value constraints:

  • subprocess.check_output could have a value constraint on the universal_newlines argument. If the value is the literal True, the return type is str; if it's False, the return type is bytes (Python 3).
  • open could have a value constraint on the mode argument. If it contains the substring 'b', the result would be IO[bytes]; otherwise, it would be IO[str] (again Python 3).

It's not obvious what do do when a value constraint can't be evaluated because the value is determined at runtime. To be compatible with how things work right now, we could fall back to Any.

We can't invent any new syntax for this. Here are a few ideas for a constraint syntax:

from typing.constraints import constraint, Equatable

@constraint(Equatable[T, S])
def is_equal(x: T, y: S) -> bool:
    return x == y

def is_equal(x: T, y: S) -> bool:
    # constraint: Equatable[T, S]
    """docstring"""
    return x == y

A decorator could easily make the constraints available for introspection, which could be useful. If we'd use a comment, we'd have more freedom with syntax. For example, we could have a constraint such as Subtype[T + S, str] to denote that the type of adding values of types T and S is str. However, this could be pretty confusing as the constraint expression can't be evaluated at runtime.

Even if we decide to implement this, this is likely less urgent than a bunch of major things in the pipeline such as strict optional checking, parallel type checking and a plugin system. This is also likely less important than structural subtyping.

@ddfisher
Copy link
Collaborator Author

ddfisher commented May 9, 2016

To address just a small part of what you said: if we can't determine the particular value in a value constraint, the return type should probably just be a Union of all the possibilities.

@JukkaL
Copy link
Collaborator

JukkaL commented May 10, 2016

Having it be a union of all possibilities could sometimes be pretty inconvenient. Also, it would be against the mypy policy of trying to not infer union types unless explicitly annotated.

For example, consider open(path, mode), where mode is a variable. The result would be Union[IO[str], IO[bytes]], and here the programmer would likely need to insert a cast to IO[str] or IO[bytes]. Example:

with cast(IO[str], open(path, mode)) as f: ...

The cast wouldn't be terrible (at least as long as this isn't a common idiom), since the code could be wrong if mode happens to have b in it, and the cast would highlight this.

@ddfisher
Copy link
Collaborator Author

But you'd only need to do that if mode was supplied dynamically, which is pretty rare. Requiring that cast is probably a good thing -- forgetting that open() might return bytes when you're expecting a str (or vice versa) is an easy mistake to make in that circumstance, in my opinion.

@JukkaL JukkaL changed the title dict.get won't accept Union type Type constraints: dict.get doesn't accept Union type as key Jun 6, 2017
@JukkaL
Copy link
Collaborator

JukkaL commented Jan 28, 2020

I think that this is probably too complex to be worth implementing in the medium term. Anyway, type system extensions should now be proposed through the typing issue tracker or typing-sig@.

@JukkaL JukkaL closed this as completed Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants