Skip to content

Keep trivial instances and aliases during expansion #19543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2025

Conversation

ilevkivskyi
Copy link
Member

This weirdly looking change consistently shows 1% performance improvement on my machine (Python 3.12, compiled). The key here is to not create new objects for trivial instances and aliases (i.e. those with no .args). The problem however is that some callers modify expanded type aliases in place (for better error locations). I updated couple places discovered by tests, but there may be more.

I think we should go ahead, and then fix bugs exposed by this using following strategy:

  • Always use original aliases (not theirs expansions) as error locations (see change in semanal.py).
  • If above is not possible (see unpacked tuple change), then call t.copy_modified() followed by t.set_line() manually.

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

The new error in pandas is weird. But it is actually expected, taking into account that TypeGuard has this weird rule that we type-checkers must apply it verbatim, rather than do some narrowing logic (in this case it actually widens the declared type). But:

  • Any part is surprising.
  • Not clear why it behaved differently before.

I will try to come up with a small repro, and look in more details.

@sterliakov
Copy link
Collaborator

It has something to do with omitted generics (np.integer has one generic param for bit width, defaulting to Any). If I use -> TypeGuard[int | np.int8], for example, mypy master also reports this error, and neither master nor this HEAD add Any.

Both on master and here:

import numpy as np
from typing import TypeGuard

def is_smth(t: object) -> TypeGuard[int | np.int8]: ...

def check(a: int, b: int) -> int:
    if not is_smth(a): assert False
    if a < 0: a += b  # E: Incompatible types in assignment (expression has type "int | signedinteger[_8Bit]", variable has type "int")
    if not 0 <=a <= b: assert False
    return a  # E: Incompatible return value type (got "int | signedinteger[_8Bit]", expected "int")

But replacing np.int8 with np.integer breaks things: master reports nothing, this PR reports return but with Any in union:

import numpy as np
from typing import TypeGuard

def is_smth(t: object) -> TypeGuard[int | np.integer]: ...

def check(a: int, b: int) -> int:
    if not is_smth(a): assert False
    if a < 0: a += b
    if not 0 <=a <= b: assert False
    return a  # E: Incompatible return value type (got "Any | int | integer[Any]", expected "int")

@ilevkivskyi
Copy link
Member Author

Nice, thanks for the pointer!

@ilevkivskyi
Copy link
Member Author

OK, I figured it out. So two things here:

  • The if a < 0: a += b part is important here. This is what adds Any (on both master and HEAD)
  • The actual difference comes from a little refactoring I made in flatten_nested_union(). This refactoring should be a no-op, except get_proper_type() also unwraps TypeGuardedType (I think we should get rid of this). As a result, since update_from_options() calls make_simplified_union(), this "use type guard type no matter what" behavior was single-shot, after first use it was unwrapped. But now it is "sticky". FWIW I really don't like this idea to "narrow" types to proper supertype of declared type. But I also don't want to mix semantic changes with performance optimizations, so I will restore existing "single-shot" behavoiur for now. Later we can discuss if we really want this cc @JelleZijlstra

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

Hm, mypy_primer is interesting, if I implement the old semantics in more principled way, it looks like more what people expect. Also it looks like we will have some semantic changes one way or another, unless I do some ugly special-casing. So maybe I can try some alternative way, similar to what I mentioned above.

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

pandas (https://github.com/pandas-dev/pandas)
+ pandas/io/parsers/python_parser.py:1552: error: Unused "type: ignore" comment  [unused-ignore]
- pandas/core/series.py:2839: error: Argument 2 to "diff" has incompatible type "float | int | integer[Any]"; expected "int"  [arg-type]
+ pandas/core/series.py:2839: error: Argument 2 to "diff" has incompatible type "float | int"; expected "int"  [arg-type]
+ pandas/core/frame.py:5897: error: Redundant cast to "int"  [redundant-cast]
+ pandas/core/groupby/groupby.py:5196: error: Redundant cast to "int"  [redundant-cast]

mypy (https://github.com/python/mypy)
+ mypy/types.py:3217: error: Unused "type: ignore" comment  [unused-ignore]
+ mypy/types.py:3242: error: Unused "type: ignore" comment  [unused-ignore]
+ mypy/meet.py:119: error: Unused "type: ignore" comment  [unused-ignore]
+ mypy/meet.py:311: error: Unused "type: ignore" comment  [unused-ignore]

@ilevkivskyi
Copy link
Member Author

OK, I like the current version the most. Although it is not identical to the original semantics, it is IMO more consistent (and again, looking few changes in the primer, this is what people expect).

Copy link
Collaborator

@sterliakov sterliakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I like the primer results! The change itself LG, only copying explicitly if we need that is way more principled than current "hope it was already a copy" approach. There are likely other places assuming copies, but we'll find them eventually. Let's also wait for the mypy-issues check to see if there are any relevant regressions

@sterliakov
Copy link
Collaborator

Okay, nothing interesting. There's one changed output for the snippet in this comment:

#17483 (comment)

With int instead of str the output is unchanged.

@ilevkivskyi
Copy link
Member Author

Yeah, this is because str is a subclass of Sequence[str] (for better or worse).

@ilevkivskyi ilevkivskyi merged commit 7c4ec52 into python:master Aug 1, 2025
20 checks passed
@ilevkivskyi ilevkivskyi deleted the same-instance branch August 1, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants