Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] simplify subtypes from unions #13401

Merged
merged 2 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 49 additions & 6 deletions crates/red_knot_python_semantic/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -388,16 +388,18 @@ impl<'db> Type<'db> {
}
}

/// Return true if this type is [assignable to] type `target`.
/// Return true if this type is a [subtype of] type `target`.
///
/// [assignable to]: https://typing.readthedocs.io/en/latest/spec/concepts.html#the-assignable-to-or-consistent-subtyping-relation
pub(crate) fn is_assignable_to(self, db: &'db dyn Db, target: Type<'db>) -> bool {
/// [subtype of]: https://typing.readthedocs.io/en/latest/spec/concepts.html#subtype-supertype-and-type-equivalence
pub(crate) fn is_subtype_of(self, db: &'db dyn Db, target: Type<'db>) -> bool {
if self.is_equivalent_to(db, target) {
return true;
}
match (self, target) {
(Type::Unknown | Type::Any | Type::Never, _) => true,
(_, Type::Unknown | Type::Any) => true,
(Type::Unknown | Type::Any, _) => false,
(_, Type::Unknown | Type::Any) => false,
(Type::Never, _) => true,
(_, Type::Never) => false,
(Type::IntLiteral(_), Type::Instance(class))
if class.is_stdlib_symbol(db, "builtins", "int") =>
{
Expand All @@ -417,12 +419,28 @@ impl<'db> Type<'db> {
(ty, Type::Union(union)) => union
.elements(db)
.iter()
.any(|&elem_ty| ty.is_assignable_to(db, elem_ty)),
.any(|&elem_ty| ty.is_subtype_of(db, elem_ty)),
// TODO
_ => false,
}
}

/// Return true if this type is [assignable to] type `target`.
///
/// [assignable to]: https://typing.readthedocs.io/en/latest/spec/concepts.html#the-assignable-to-or-consistent-subtyping-relation
pub(crate) fn is_assignable_to(self, db: &'db dyn Db, target: Type<'db>) -> bool {
match (self, target) {
(Type::Unknown | Type::Any, _) => true,
(_, Type::Unknown | Type::Any) => true,
(ty, Type::Union(union)) => union
.elements(db)
.iter()
.any(|&elem_ty| ty.is_assignable_to(db, elem_ty)),
// TODO other types containing gradual forms (e.g. generics containing Any/Unknown)
_ => self.is_subtype_of(db, target),
}
}

/// Return true if this type is equivalent to type `other`.
pub(crate) fn is_equivalent_to(self, _db: &'db dyn Db, other: Type<'db>) -> bool {
// TODO equivalent but not identical structural types, differently-ordered unions and
Expand Down Expand Up @@ -1132,6 +1150,31 @@ mod tests {
assert!(!from.into_type(&db).is_assignable_to(&db, to.into_type(&db)));
}

#[test_case(Ty::Never, Ty::IntLiteral(1))]
#[test_case(Ty::IntLiteral(1), Ty::BuiltinInstance("int"))]
#[test_case(Ty::StringLiteral("foo"), Ty::BuiltinInstance("str"))]
#[test_case(Ty::StringLiteral("foo"), Ty::LiteralString)]
#[test_case(Ty::LiteralString, Ty::BuiltinInstance("str"))]
#[test_case(Ty::BytesLiteral("foo"), Ty::BuiltinInstance("bytes"))]
#[test_case(Ty::IntLiteral(1), Ty::Union(vec![Ty::BuiltinInstance("int"), Ty::BuiltinInstance("str")]))]
fn is_subtype_of(from: Ty, to: Ty) {
let db = setup_db();
assert!(from.into_type(&db).is_subtype_of(&db, to.into_type(&db)));
}

#[test_case(Ty::Unknown, Ty::IntLiteral(1))]
#[test_case(Ty::Any, Ty::IntLiteral(1))]
#[test_case(Ty::IntLiteral(1), Ty::Unknown)]
#[test_case(Ty::IntLiteral(1), Ty::Any)]
#[test_case(Ty::IntLiteral(1), Ty::Union(vec![Ty::Unknown, Ty::BuiltinInstance("str")]))]
#[test_case(Ty::IntLiteral(1), Ty::BuiltinInstance("str"))]
#[test_case(Ty::BuiltinInstance("int"), Ty::BuiltinInstance("str"))]
#[test_case(Ty::BuiltinInstance("int"), Ty::IntLiteral(1))]
fn is_not_subtype_of(from: Ty, to: Ty) {
let db = setup_db();
assert!(!from.into_type(&db).is_subtype_of(&db, to.into_type(&db)));
}

#[test_case(
Ty::Union(vec![Ty::IntLiteral(1), Ty::IntLiteral(2)]),
Ty::Union(vec![Ty::IntLiteral(1), Ty::IntLiteral(2)])
Expand Down
33 changes: 32 additions & 1 deletion crates/red_knot_python_semantic/src/types/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,23 @@ impl<'db> UnionBuilder<'db> {
pub(crate) fn add(mut self, ty: Type<'db>) -> Self {
match ty {
Type::Union(union) => {
self.elements.extend(union.elements(self.db));
for element in union.elements(self.db) {
carljm marked this conversation as resolved.
Show resolved Hide resolved
self = self.add(*element);
}
}
Type::Never => {}
_ => {
let mut remove = vec![];
Copy link
Contributor

@hauntsaninja hauntsaninja Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In equivalent mypy code, I had to add a special fast path for literals. You can do better than quadratic for unions with lots of literals of the same type, which turns out to be a thing in the wild

Copy link
Contributor Author

@carljm carljm Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for pointing this out! I took a look at that optimization in mypy.

I think the case this would optimize is where a union already contains e.g. str, and then we try to add lots of string literal types to it, every one of which is redundant because its a subtype of str. Rather than going through all existing union members to check if each literal is a subtype of any of them, we can keep a hash-set of "types present in this union which have literal forms" and do an O(1) contains check against that set as the first step when adding a literal type to the union. Framed in more general terms, it's identifying that a certain set of common types have a single super-type that is most likely to rule them out of the union, and so we optimize checking for that most likely super-type by identity.

This makes sense; I'd prefer to wait to add this kind of optimization until we see it crop up in a real-world codebase and can evaluate the actual impact of the optimization in our case, but it's definitely a useful idea to keep in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would be useful is if we added one (or more) benchmarks based on a real-world codebase that makes heavy use of large literals. (I.e., pydantic.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite the right description, it's also useful when the union doesn't contain the supertype (i.e. str). For instance, say if you were combining two unions that you knew consisted only of literal types, you could use a set union, which is linear. The mypy optimisation I added is basically that, but also works when there are non-literal types thrown in as well. Fair enough on waiting though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks, yeah, I misread the code. The set is unduplicated_literal_fallbacks, not duplicated_literal_fallbacks. So it looks like it's optimizing only the case you described; the mirror image of the case I described.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a new issue collating some of the perf issues mypy and pyright have encountered relating to unions: #13549

Copy link
Contributor

@hauntsaninja hauntsaninja Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I think there are some mypy PRs missing from the list, so if you're interested in code I'd make sure to look at main.
I'll also make it such that if you're interested in real world use cases you should only have to look at primer, looks like there are 1-2 things I never actually added.

for element in &self.elements {
if ty.is_subtype_of(self.db, *element) {
return self;
} else if element.is_subtype_of(self.db, ty) {
remove.push(*element);
}
}
for element in remove {
self.elements.remove(&element);
}
carljm marked this conversation as resolved.
Show resolved Hide resolved
carljm marked this conversation as resolved.
Show resolved Hide resolved
self.elements.insert(ty);
}
}
Expand Down Expand Up @@ -368,6 +381,24 @@ mod tests {
assert_eq!(union.elements_vec(&db), &[t0, t1, t2]);
}

#[test]
fn build_union_simplify_subtype() {
let db = setup_db();
let t0 = builtins_symbol_ty(&db, "str").to_instance(&db);
let t1 = Type::LiteralString;
let t2 = Type::Unknown;
let u0 = UnionType::from_elements(&db, [t0, t1]);
let u1 = UnionType::from_elements(&db, [t1, t0]);
let u2 = UnionType::from_elements(&db, [t0, t1, t2]);

assert_eq!(u0, t0);
assert_eq!(u1, t0);
assert_eq!(u2.expect_union().elements_vec(&db), &[t0, t2]);
}

#[test]
fn build_union_no_simplify_any() {}

impl<'db> IntersectionType<'db> {
fn pos_vec(self, db: &'db TestDb) -> Vec<Type<'db>> {
self.positive(db).into_iter().copied().collect()
Expand Down
3 changes: 1 addition & 2 deletions crates/red_knot_python_semantic/src/types/infer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5800,8 +5800,7 @@ mod tests {
.unwrap();
db.write_file("/src/c.pyi", "x: int").unwrap();

// TODO this should simplify to just 'int'
assert_public_ty(&db, "/src/a.py", "x", "int | Literal[1]");
assert_public_ty(&db, "/src/a.py", "x", "int");
}

// Incremental inference tests
Expand Down
Loading