Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariance is not inferred for NamedTuple and frozen Dataclasses #17623

Open
nielsbuwen opened this issue Aug 2, 2024 · 3 comments
Open

Covariance is not inferred for NamedTuple and frozen Dataclasses #17623

nielsbuwen opened this issue Aug 2, 2024 · 3 comments
Labels
bug mypy got something wrong

Comments

@nielsbuwen
Copy link

The new experimental feature NewGenericSyntax (#15238) should infer the variance of type variables. For attributes in NamedTuple and frozen dataclass the variance is inferred as "invariant" when it should be "covariant"
(A clear and concise description of what the bug is.)

To Reproduce

I defined a generic NamedTuple and frozen dataclass with one member of type T. The classes should be covariant in T. So converting from Container[int] to Container[int |None] should be no problem.

playground link

Expected Behavior

mypy should accept all four variants of the type def

Actual Behavior

For Sequence and tuple[T, ...] it correctly accepts the function definition.

But for NamedTuple and dataclass it says

Incompatible return value type (got "XY[int]", expected "XY[int | None]") [return-value]

It looks like mypy does not think that the NamedTuple and dataclass are not covariant in T.

Your Environment

  • Mypy Playgrond
  • Python: 3.12
  • with or without --strict
  • --enable-incomplete-feature=NewGenericSyntax
  • versions tried: 1.11.1 and master
@nielsbuwen nielsbuwen added the bug mypy got something wrong label Aug 2, 2024
@erictraut
Copy link

It makes sense that variables in a NamedTuple and a frozen dataclass should be considered read-only and should therefore not cause a TypeVar to be inferred as invariant.

Interestingly, Python 3.13's addition of a __replace__ method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a __replace__ method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer. Building on your playground code sample, the signature of of this synthesized method is:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__(self, *, value: T) -> Self: ...

This __replace__ signature uses T in a contravariant position, which means that the auto-variance algorithm will determine that T is invariant. See this issue for details.

You can see this here in the pyright playground. If you hover over T, you'll see that pyright infers it to be covariant.

image

However, if the python version is then changed to Python 3.13, its variance changes to invariant, and a type error is reported in the upcast function.

image

@max-muoto, I'm interested in your thoughts here since you added the support for __replace__ in mypy. I used the same approach in pyright. This variance problem didn't occur to me previously.

Here are some options for us to consider:

  1. We live with the consequences that the synthesized __replace__ always makes any TypeVars used in a NamedTuple or a dataclass invariant starting in Python 3.13.
  2. We add a special-case exemption for __replace__ in the TypeVar variance algorithm. There are already exemptions for __init__ and __new__, so maybe this is acceptable.
  3. We modify the signature for __replace__. I can't think of any way of doing this without making __replace__ unsafe from a type checking perspective. It would probably need to accept parameters of type object and return an object of type Any.

@JelleZijlstra
Copy link
Member

I thought about synthesizing a signature like this:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__[U = T](self, *, value: U = ...) -> Self: ...

(Note the addition of a default for value. Every parameter to the __replace__ method should be optional.)

Now, if (given fcb: FrozenDataclassBox[bool]) you call fcb.__replace__(), U should get solved to the value of T, so you get a FrozenDataclassBox[bool], which is correct. And if you do fcb.__replace__(value=1), U gets solved to int, and you get a FrozenDataclassBox[int]. I believe this also extends to the general case where there may be multiple fields and multiple type parameters. If the original parameter T has bounds or constraints, then the synthesized type parameter U should have the same bounds or constraints.

However, this is invalid because TypeVar defaults may not use TypeVars from an outer scope (https://typing.readthedocs.io/en/latest/spec/generics.html#scoping-rules) and because the meaning of TypeVar defaults with function-scoped type parameters is unspecified (https://typing.readthedocs.io/en/latest/spec/generics.html#function-defaults). I tried to get around that by using a different type variable, but pyright still doesn't allow it (https://pyright-play.net/?pythonVersion=3.13&strict=true&code=GYJw9gtgBAJghgFzgYwDZwM4YKYagSwgAcwQFZEV0sBYAKFEigQE8j8A7AcwONPMw4yAfVZFs9egAF4SNIIAUjAF7YOAXgAqIAK7YAlPXlYoAMXCqOAEUrGMAITAAPANqaAugC56UX1ABucKh6nlCaknR%2BsNjAUMLCINhE6MjY8W4ATAA0YQDMUOphGe4KOKjAoeZgljZy1A7Ome45AFQ5gcHYoZr5hQB0A-pQALQAfGYWarVUgo6uPV5QA30RMDHMuAhKyABGlZPWtvVzLjtgYKju%2Bt6RfoLYImLY2zt98YnJKGnCCvo5VTUjrNGmcLlcfHcsA8EKI2M9gLs3gkkilvgoOnp1ABGP4TapTIFYE6cBDguhAA), and I don't think its behavior is incorrect. Still, perhaps a type checker could synthesize an internal signature following this approach.

@max-muoto
Copy link
Contributor

max-muoto commented Aug 3, 2024

It makes sense that variables in a NamedTuple and a frozen dataclass should be considered read-only and should therefore not cause a TypeVar to be inferred as invariant.

Interestingly, Python 3.13's addition of a __replace__ method effectively breaks this. Mypy was recently updated (as was pyright) to synthesize a __replace__ method in NamedTuples and dataclasses. This is done only for Python 3.13 and newer. Building on your playground code sample, the signature of of this synthesized method is:

@dataclass(frozen=True)
class FrozenDataclassBox[T]:
    value: T

    def __replace__(self, *, value: T) -> Self: ...

This __replace__ signature uses T in a contravariant position, which means that the auto-variance algorithm will determine that T is invariant. See this issue for details.

You can see this here in the pyright playground. If you hover over T, you'll see that pyright infers it to be covariant.

image However, if the python version is then changed to Python 3.13, its variance changes to invariant, and a type error is reported in the `upcast` function. image @max-muoto, I'm interested in your thoughts here since you added the support for `__replace__` in mypy. I used the same approach in pyright. This variance problem didn't occur to me previously.

Here are some options for us to consider:

  1. We live with the consequences that the synthesized __replace__ always makes any TypeVars used in a NamedTuple or a dataclass invariant starting in Python 3.13.
  2. We add a special-case exemption for __replace__ in the TypeVar variance algorithm. There are already exemptions for __init__ and __new__, so maybe this is acceptable.
  3. We modify the signature for __replace__. I can't think of any way of doing this without making __replace__ unsafe from a type checking perspective. It would probably need to accept parameters of type object and return an object of type Any.

This is definitely something I didn't think of when adding this to dataclasses, but it makes sense that we'd start to hit this. I think special-casing is probably the best path forward here, I obviously don't maintain any type-checkers, but the main concern here that comes to mind is the lack of documentation around this + all of the current special casing for __replace__. MyPy and Pyright will obviously have the correct behavior, and we can create issues for Pyre/PyType, but I would be worried about others implementing type-checkers. I made a cpython issue to document this alongside other dataclass behavior, but not sure that will happen: python/cpython#121371

At least, in my experience, the more I can rely on auto-variance from 3.12 type parameters, the faster I can work with the typing system, so I think making it work correctly by default, even with special casing, is probably the best option for most users.

I think it would be good to define this implicit behavior around synthesizing __replace__ in the typing spec, and also updating the docs for the auto-variance algorithm. Happy to contribute here.

I do like @JelleZijlstra's idea though, I could play around with it in MyPy to see if it might work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong
Projects
None yet
Development

No branches or pull requests

4 participants