Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] feat: Introduce Truthy and Falsy to allow more precise typing #13665

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Slyces
Copy link
Contributor

@Slyces Slyces commented Oct 7, 2024

Summary

This is a PR trying to address #13632.

Reminder of the problem: when evaluating boolean operations, we could improve precision of types

reveal_type(str_instance() and 8)  # Current: `str | Literal[8]` -- Expected `Literal["", 8]`

The idea is that in some contexts, we can narrow a type to the subset of instances that are truthy (boo(instance) == True) or the subset of instances that are falsy(boo(instance) == False).

As @carljm suggested, a very generic way to express this would be to implement two new types, Truthy and Falsy, and express "the subset of A instances that evaluate to True" as A & Truthy (and conversely for False and Falsy).

Interface

On a high level, this should be fairly straightforward to use when required

  • Introduce two new types Truthy and Falsy
  • Convenience method to create A & Truthy (and falsy): IntersectionBuilder::build_truthy(a_ty)

Interesting Cases

Here's a list of interesting cases that might be unlocked by this more precise type knowledge. Some of them are more or less complicated to handle and maybe shouldn't be supported.

Behaviour of Falsy and Truthy outside of intersections

While they only make sense in intersections and probably should only exist in that context, we do need to implement the behaviour of Falsy and Truthy outside of intersections, as they're a Type variant.
Suggestions are welcome if I didn't handle that part properly, I don't have specific opinions or feedback on this part of the implementation.

Consistency of intersections

As it's been raised in this comment by @AlexWaygood, the existence of Truthy and Falsy coupled with having both positive (must be) and negative (must not be) elements in intersection leads to a question:

  • Do we need both truthy and falsy
  • X & Falsy = X & ~Truthy

Currently, as this PR explores a first attempt with Truthy and Falsy, we need to make sure that only one of the above two representations is used.

Defined falsy and truthy subsets

Some types have a known subset of instances that are falsy:

  • bool → False
  • int → 0
  • str → ""
  • bytes → b''
  • tuple → (,)
  • list → [], dict → {}, set → set() - we can't really represent that at the moment

Conversely for truthy (most types in Python are truthy, so this is rarely defined)

  • bool -> True

This could be used in a few ways (that I can think of)

  • Simplify int & Falsy to Literal[0] (we most probably want that)
  • Simplify int & Truthy | Literal[0] to int
    # Case 1
    reveal_type(int_instance() or 0)  # Should be `int != 0 | 0` → `int`
    • Note: that's a more advanced case of X & Truthy | X & FalsyX (which we should support)
  • Unify int & ~0 (where ~0 is Literal[0] in the negatives) to int & Truthy
    • This is not really a "simplification" as both say the same thing, but we need consistency
  • x1 | x2 where x1 is X & Truthy and x2 is X & Falsy
    • This is what we're doing with BooleanLiteral(true), BooleanLiteral(false) and bool
    • Not coded as I can't think of any other type that has both a known set of  truthy and falsy values

So far, all subsets I can think of have been singletons (no UnionType). This means that my code might maybe either be too simple (can't handle non-singleton) or too complicated (tries to handle non singleton) at times.

I guess that if we want to include this knowledge in the feature, knowing if non-singleton cases should be supported (or if they even exist) would help.

Simplifying X & Falsy / X & Truthy

I think that's a fairly straightforward feature. Some types (e.g. FunctionType) have a constant Truthiness (for FunctionType.bool() → Truthiness::AlwaysTrue), so associating them with Truthy or Falsy can be simplified.

  • X & TruthyX if x.bool() == Truthiness::AlwaysTrue
  • X & FalsyNever if x.bool() == Truthiness::AlwaysTrue
  • ... (same for AlwaysFalse)

Overall, we should probably guarantee at build time that you can't have both AlwaysTrue and AlwaysFalse elements in an intersection, because elements of this intersection would need to have both properties, and that's not possible.

Test Plan

  • Added unit tests in the builder.rs for intersection/union cases
    • Overall, all features written about above should have one or more dedicated tests
  • Added some "integration tests" in infer.rs through boolean chained operations (currently the only way to create X & Truthy or X & Falsy intersections)
  • Edited existing tests with a more precise type inference
  • Solved a random test's // TODO
    • Literal[0] | None | Literal[1] & None became Literal[0] | None with this PR which was the expected behaviour
    • Note that this should have been solved by other features eventually

We could eventually add some additional integration tests that would be "easier to read" by adding support for the following snippet

a = A()
reveal_type(a)  # type: `A`
if A():
   reveal_type(a)  # type: `A & Truthy`

Concerns & Feedback on the current implementation

I think this feature is very exciting. It would allow us to represent types with more precision, which is always nice.

I do have some concerns with my implementation as it is now

  • Taken individually, most cases are easy to follow
  • But it feels like I "hardcorded" many corner cases in the code
    • Especially the InnerIntersectionBuilder::simplify method
  • I think that I can't really picture the kind of intersections we'll work with in practice, so I might either not test enough or test things that can't happen
  • Performance might not be optimal in some areas

As always feedback is welcome, don't hesitate to question the entire direction I took to achieve this, I'll be happy to refine this with whatever we can learn from this first iteration.

Copy link
Contributor

github-actions bot commented Oct 7, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@Slyces Slyces changed the title Feat: introduce Truthy and Falsy to allow more precise typing [red-knot] feat: Introduce Truthy and Falsy to allow more precise typing Oct 7, 2024
@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Oct 7, 2024
@carljm
Copy link
Contributor

carljm commented Oct 8, 2024

Thanks for the PR, this is very cool! I've started a review, but there's a lot to consider here, and I have some other pressing things to take care of, so it may be a couple days before I can complete the review.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll echo Carl that this is pretty cool!!

Carl's probably better placed to do a full review here, in particular for the logic you added in builder.rs -- but here's some things I noticed, in the meantime:

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
Slyces and others added 3 commits October 8, 2024 13:55
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
- Fix a logic issue when dealing with the falsy/truthy set of instances
  (set of size 1).
- Introduce `TupleType::empty` and others as a convenience method.
@AlexWaygood AlexWaygood dismissed their stale review October 8, 2024 16:06

requested changes from initial review were addressed

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this! It's quite an ambitious project, and you definitely took it further than I'd anticipated for a first PR on the subject. Reviewing it was super useful in forcing me to think through a bunch of cases carefully, so regardless of what happens with it now, that was a super useful contribution.

My conclusion (recorded in the inline comments) is that I think we need to significantly scale back our aggressiveness in applying conclusions from truthiness and falsiness, which I think will probably simplify some of the handling. And in general I'd like to see the handling in UnionBuilder and IntersectionBuilder be a little more general and do less avoidable repeated work.

Thanks again, and sorry for the delayed review!

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
/// Returns, if it can be expressed, the set of values that are falsy in this type. This is
/// defined for some builtin types (e.g. int, str, ...)
#[must_use]
pub fn falsy_set(&self, db: &'db dyn Db) -> Option<Type<'db>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is really implementing "simplification of intersection with Falsy", which is logic I would expect to find in IntersectionBuilder rather than in a method on Type.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
}
}

/// Returns, if it can be expressed, the set of values that are truthy in this type. This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "can be expressed" formulation, here and above, has an implicit "without an explicit intersection type" caveat: all intersections with truthy/falsy "can be expressed" as an intersection. This is another reason why I think these methods should rather be internal implementation details of IntersectionBuilder.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
Comment on lines +851 to +861
// `X` -> `AlwaysFalse` => `X & Falsy` = `X`
let empty_tuple = Type::Tuple(TupleType::empty(&db));
assert_eq!(
IntersectionBuilder::build_falsy(&db, empty_tuple),
empty_tuple
);

assert_eq!(
IntersectionBuilder::build_falsy(&db, empty_bytes),
empty_bytes
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug that pre-exists your PR, but unfortunately even Type::Tuple should always have Truthiness::Ambiguous, again because of subclassing: even the "empty tuple" type can include an "empty" instance of a hypothetical tuple subclass that might have arbitrary __bool__ behavior; the __bool__ behavior isn't specified as part of the tuple type. So we should fix this in the Type::bool method so Type::Tuple is always Truthiness::Ambiguous, and rewrite this test to use a different type than tuple.

(This doesn't apply to e.g. BytesLiteral or StringLiteral or IntLiteral -- those types can only represent literals, so we know they are the built-in type, not a subclass.)

assert_eq!(
IntersectionBuilder::build_falsy(
&db,
Type::Tuple(TupleType::new(&db, vec![Type::IntLiteral(0)].into()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below, we can't actually know that a non-empty Type::Tuple is truthy, so we should fix that in Type::bool and use a different truthy type here.

Comment on lines +367 to +398
if self.positive.contains(&Type::Truthy)
&& self
.positive
.iter()
.all(|ty| ty.bool(db) == Truthiness::AlwaysTrue)
{
self.positive.remove(&Type::Truthy);
}

// If we have `Falsy` and all elements are always false, we can remove it
if self.positive.contains(&Type::Falsy)
&& self
.positive
.iter()
.all(|ty| ty.bool(db) == Truthiness::AlwaysFalse)
{
self.positive.remove(&Type::Falsy);
}

// If we have both `AlwaysTrue` and `AlwaysFalse`, this intersection should be empty.
if self
.positive
.iter()
.any(|ty| ty.bool(db) == Truthiness::AlwaysFalse)
&& self
.positive
.iter()
.any(|ty| ty.bool(db) == Truthiness::AlwaysTrue)
{
self.positive.clear();
self.negative.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up calling .bool(db) on every type in self.positive four times; it shouldn't be necessary to do it more than once.

Comment on lines +410 to +436
// If an intersection is `X & Falsy`, try to replace it by the falsy set of `X`
// TODO: this doesn't handle the case `X & Y & Falsy` where `(X & Y)` would have a known
// falsy set (this doesn't happen yet, can it happen?)
if self.positive.len() == 2 && self.positive.contains(&Type::Falsy) {
self.positive.remove(&Type::Falsy);
let ty = self.positive.iter().next().unwrap();
if let Some(falsy) = ty.falsy_set(db) {
self.positive.clear();
self.positive.insert(falsy);
} else {
self.positive.insert(Type::Falsy);
}
}

// If an intersection is `X & Truthy`, try to replace it by the truthy set of `X`
// TODO: this doesn't handle the case `X & Y & Truthy` where `(X & Y)` would have a known
// truthy set (this doesn't happen yet, can it happen?)
if self.positive.len() == 2 && self.positive.contains(&Type::Truthy) {
self.positive.remove(&Type::Truthy);
let ty = self.positive.iter().next().unwrap();
if let Some(truthy) = ty.truthy_set(db) {
self.positive.clear();
self.positive.insert(truthy);
} else {
self.positive.insert(Type::Truthy);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle this more like union does, doing pairwise checks between each element (and probably better to do it as we add elements, rather than in a final simplify pass). And preferably the check would be more generic, like simplify_intersection_of(db, a: Type, b: Type) -> Option<Type> (returning None if no simplification is possible, or a Type if the intersection of a and b can simplify to a type) rather than so special-cased to Truthy and Falsy. There are a lot more cases where we can simplify intersections than just Truthy and Falsy (e.g. intersections between a lot of not-identical literal types just simplify to Never). Not saying we have to implement all of those in this PR, just that we should try to set up general infrastructure based on general set-theoretic principles, rather than a long list of special cases, which is going to be both harder to maintain and a lot less efficient.

Comment on lines +86 to +140
// Handle `X & Truthy | X & Falsy` -> `X`
(Type::Intersection(present), Type::Intersection(inserted)) => {
// Detect `X & Truthy | Y & Falsy`
if let (Some(present_ty), Some(inserted_ty)) =
(present.truthy_of(self.db), inserted.falsy_of(self.db))
{
// If `X` = `Y`, we can simplify `X & Truthy | X & Falsy` to `X`
if present_ty == inserted_ty {
to_add = present_ty;
to_remove.push(index);
break;
}
}

// Detect `X & Falsy | Y & Truthy`
if let (Some(present_ty), Some(inserted_ty)) =
(present.falsy_of(self.db), inserted.truthy_of(self.db))
{
// If `X` = `Y`, we can simplify `X & Falsy | X & Truthy` to `X`
if present_ty == inserted_ty {
to_add = present_ty;
to_remove.push(index);
break;
}
}
}

// Corner-case of the previous `X & Truthy | X & Falsy` -> `X`
// Some `X & Truthy` or `X & Falsy` types have been simplified to a
// specific subset of instances of the type.
(Type::Intersection(inter), ty) | (ty, Type::Intersection(inter)) => {
if let Some(inter_ty) = inter.truthy_of(self.db) {
// 'X & Truthy | y' -> test if `y` = `X & Falsy`
if let Some(falsy_set) = inter_ty.falsy_set(self.db) {
if falsy_set == ty {
to_add = inter_ty;
to_remove.push(index);
break;
}
}
}

if let Some(inter_ty) = inter.falsy_of(self.db) {
// 'X & Falsy | y' -> test if `y` = `X & Truthy`
if let Some(truthy_set) = inter_ty.truthy_set(self.db) {
if truthy_set == ty {
to_add = inter_ty;
to_remove.push(index);
break;
}
}
}
}
_ => {}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to comment in detail on this since I've suggested semantic changes in other comments that I think will end up changing it a lot anyway. But let's keep in mind the general principles that we need to think pretty hard about doing things as efficiently as we can, and we should prefer abstracting to general relations between types whenever we can. For instance, we already have the subtype checks below and the bool checks above -- I think all of it, and a lot of the new stuff you're adding, maybe could abstract to a general simplify_union_of(db, a: Type, b: Type) -> Option<Type> which returns None if there is no simplification and a and b should both stay in the union, or a new Type to be added to the union instead of a and b. (Not 100% sure if that will cover all cases without actually working it all out in code myself, but hopefully that gives the flavor of the kind of abstractions we should be looking for to keep this code maintainable.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants