-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[red-knot] Improve is_disjoint for two intersections #16636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
(Never mind, it works perfectly fine - I made a mistake running the test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
So something I failed to mention (or remember to do myself) on your previous PR: we have a set of quickcheck property tests, which check a bunch of invariants that should apply to type relations, using randomly-generated types. These don't run in CI (because they are inherently slow and non-deterministic), but they can be very helpful in catching gaps in our understanding of type relations. To run them, I use QUICKCHECK_TESTS=100000 cargo test -p red_knot_python_semantic -- --ignored types::property_tests::stable. (You can adjust the QUICKCHECK_TESTS number to tradeoff test running time with confidence that any failure cases should be caught.)
Your previous PR actually broke a couple of these (all_type_pairs_are_assignable_to_their_union and subtype_of_implies_assignable_to), but this PR seems to fix those! This PR, however, does break disjoint_from_is_irreflexive and disjoint_from_is_symmetric:
---- types::property_tests::stable::disjoint_from_is_irreflexive stdout ----
thread 'types::property_tests::stable::disjoint_from_is_irreflexive' panicked at /Users/carlmeyer/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/quickcheck-1.0.3/src/tester.rs:165:28:
[quickcheck] TEST FAILED. Arguments: (Intersection { pos: [], neg: [Union([BuiltinClassLiteral("bool"), KnownClassInstance(TypeAliasType)])] })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- types::property_tests::stable::disjoint_from_is_symmetric stdout ----
thread 'types::property_tests::stable::disjoint_from_is_symmetric' panicked at /Users/carlmeyer/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/quickcheck-1.0.3/src/tester.rs:165:28:
[quickcheck] TEST FAILED. Arguments: (Intersection { pos: [], neg: [KnownClassInstance(NoDefaultType)] }, Intersection { pos: [LiteralString], neg: [KnownClassInstance(TypeAliasType)] })
Also, we have some parallel work going on here accidentally, another contributor just submitted #16641, which also fixes the two property tests currently failing on main, and stabilizes a couple other property tests that are currently flaky. It looks like there is some overlap between that PR and this one, but that one does some additional stuff around understanding literal integer and string types and their relationship to AlwaysTruthy and AlwaysFalsy.
Ultimately I'll merge whichever of these reaches "all green" (including all stable property tests) first (and passes review otherwise), and the other one can rebase on top and add whatever is left.
| static_assert(not is_disjoint_from(Intersection[X, Z], Y)) | ||
| static_assert(not is_disjoint_from(Intersection[Y, Z], X)) | ||
|
|
||
| # If one side has a positive element and the other side has a negative of that element, they are disjoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for fully-static types, since if we have dynamic types involved, Any on each side can represent an arbitrarily different type. In the simplest case, Any is not disjoint from Not[Any]: Not[Any] is indistinguishable from Any, and clearly Any is not disjoint from Any. Maybe we should add some more tests for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suspected in the comment I left above that Any was broken but then I couldn't construct a case to assert it. Found one now that I'll add, and replacing is_assignable_to with is_subtype_of fixes it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_subtype_of is also not a clean fix - regardless of which direction I choose one of these tests fails:
static_assert(not is_disjoint_from(Intersection[Any, X], Intersection[Any, Not[Y]]))
static_assert(not is_disjoint_from(Intersection[Any, Not[Y]], Intersection[Any, X]))
static_assert(is_disjoint_from(Intersection[int, Any], Not[int]))
static_assert(is_disjoint_from(Not[int], Intersection[int, Any]))
Need to think about this some more tomorrow
| || intersection | ||
| .negative(db) | ||
| .iter() | ||
| .any(|&neg_ty| other.is_assignable_to(db, neg_ty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using is_assignable_to here is not right, but using is_subtype_of might be right? (See my comment above that Any and Not[Any] are not disjoint.)
carljm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can replace the whole of
is_subtype_of
Yes, I think that should be true. The main question would be whether this has any negative impact on performance, since is_subtype_of is used heavily in union type simplification. (It is valid to elide type A from A | B if A is a subtype of B, but not if A is just assignable to B; the latter would wrongly simplify every Any | T to just Any (or just T! It would be arbitrary which one; the non-transitivity of is_assignable_to with dynamic types is a problem here.) I suspect maybe there won't be any performance issue, though, it's certainly worth trying.
I don't think we can (or should) eliminate the concept of is_subtype_of; I think it is necessary to have the concept (and the method). But totally open to simplifying the implementation if we can do that without behavior or performance regression.
| (Type::Intersection(self_intersection), Type::Intersection(target_intersection)) => { | ||
| // Check that all target positive values are covered in self positive values | ||
| target_intersection | ||
| .positive(db) | ||
| .iter() | ||
| .all(|&target_pos_elem| { | ||
| self_intersection | ||
| .positive(db) | ||
| .iter() | ||
| .any(|&self_pos_elem| self_pos_elem.is_subtype_of(db, target_pos_elem)) | ||
| }) | ||
| // Check that all target negative values are excluded in self, either by being | ||
| // subtypes of a self negative value or being disjoint from a self positive value. | ||
| && target_intersection | ||
| .negative(db) | ||
| .iter() | ||
| .all(|&target_neg_elem| { | ||
| // Is target negative value is subtype of a self negative value | ||
| self_intersection.negative(db).iter().any(|&self_neg_elem| { | ||
| target_neg_elem.is_subtype_of(db, self_neg_elem) | ||
| // Is target negative value is disjoint from a self positive value? | ||
| }) || self_intersection.positive(db).iter().any(|&self_pos_elem| { | ||
| self_pos_elem.is_disjoint_from(db, target_neg_elem) | ||
| }) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool that this whole case can be eliminated by improving is_disjoint_from! Really nice find.
|
Need to think this over a bit more, will keep an eye on the other PR and just add my nasty test cases to it if it succeeds before me. |
|
I came up with two solutions for this, both of which work and pass all quickcheck tests, but I'm not necessarily satisified with either. Both cases require special handling of two intersections in Option 1: I'm pretty sure we can't have disjoint intersections with only negative elements - a negative element needs a positive element on the other side in order to become disjoint. This does rely on the builder simplifying things - obviously The other option is to go all in on relying on the builder - to test if two intersections are disjoint we can just combine them into a single intersection and see if the result is never: This feels more clever, but is much less efficient - IntersectionBuilder will add items one by one and check them against all current items so is actually O(n**2), and the fact that The first options is actually also O(n**2), but probably we won't have ever intersections with more than a couple dozen elements? I feel like option 1 is the better choice, will update code shortly |
carljm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I'm seeing all stable quickcheck tests passing, even at a million runs.
This PR is the best kind of PR: net negative non-test LOC and improves correctness.
crates/red_knot_python_semantic/resources/mdtest/type_properties/is_disjoint_from.md
Outdated
Show resolved
Hide resolved
…es/is_disjoint_from.md
Head branch was pushed to by a user without write access
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very elegantly done. Nice work!!
Summary
Background - as a follow up to #16611 I noticed that there's a lot of code duplicated between the
is_assignable_toandis_subtype_offunctions and considered trying to merge them.A subtype and an assignable type are pretty much the same, except that subtypes are by definition fully static, so I think we can replace the whole of
is_subtype_ofwith:if we move all of the logic to is_assignable_to and delete duplicate code. Then we can discuss if it even makes sense to have a separate is_subtype_of function (I think the answer is yes since it's used by a bunch of other places, but we may be able to basically rip out the concept).
Anyways while playing with combining the functions I noticed is that the handling of Intersections in
is_subtype_ofhas a special case for two intersections, which I didn't include in the last PR - rather I first handled right hand intersections before left hand, which should properly handle double intersections (hand-wavy explanation I can justify if needed - (A & B & C) is assignable to (A & B) because the left is assignable to both A and B, but none of A, B, or C is assignable to (A & B)).I took a look at what breaks if I remove the handling for double intersections, and the reason it is needed is because is_disjoint does not properly handle intersections with negative conditions (so instead
is_subtype_ofbasically implements the check correctly).This PR adds support to is_disjoint for properly checking negative branches, which also lets us simplify
is_subtype_of, bringing it in line withis_assignable_toTest Plan
Added a bunch of tests, most of which failed before this fix