Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] simplify subtypes from unions #13401
[red-knot] simplify subtypes from unions #13401
Changes from all commits
22648ea
37c695f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In equivalent mypy code, I had to add a special fast path for literals. You can do better than quadratic for unions with lots of literals of the same type, which turns out to be a thing in the wild
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for pointing this out! I took a look at that optimization in mypy.
I think the case this would optimize is where a union already contains e.g.
str
, and then we try to add lots of string literal types to it, every one of which is redundant because its a subtype ofstr
. Rather than going through all existing union members to check if each literal is a subtype of any of them, we can keep a hash-set of "types present in this union which have literal forms" and do an O(1) contains check against that set as the first step when adding a literal type to the union. Framed in more general terms, it's identifying that a certain set of common types have a single super-type that is most likely to rule them out of the union, and so we optimize checking for that most likely super-type by identity.This makes sense; I'd prefer to wait to add this kind of optimization until we see it crop up in a real-world codebase and can evaluate the actual impact of the optimization in our case, but it's definitely a useful idea to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what would be useful is if we added one (or more) benchmarks based on a real-world codebase that makes heavy use of large literals. (I.e., pydantic.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite the right description, it's also useful when the union doesn't contain the supertype (i.e. str). For instance, say if you were combining two unions that you knew consisted only of literal types, you could use a set union, which is linear. The mypy optimisation I added is basically that, but also works when there are non-literal types thrown in as well. Fair enough on waiting though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks, yeah, I misread the code. The set is
unduplicated_literal_fallbacks
, notduplicated_literal_fallbacks
. So it looks like it's optimizing only the case you described; the mirror image of the case I described.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a new issue collating some of the perf issues mypy and pyright have encountered relating to unions: #13549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think there are some mypy PRs missing from the list, so if you're interested in code I'd make sure to look at main.
I'll also make it such that if you're interested in real world use cases you should only have to look at primer, looks like there are 1-2 things I never actually added.