-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
red-knot: infer string literal types #13113
Conversation
Introduce a `StringLiteralType` with corresponding `Display` type and a relatively basic test that the resulting representation is as expected. Note: we currently always allocate for `StringLiteral` types. This may end up being a perf issue later, at which point we may want to look at other ways of representing `value` here, i.e. with some kind of smarter string structure which can reuse types. That is most likely to show up with e.g. concatenation.
// TODO fix this comment with whatever it is we should say we | ||
// want to do in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect this is probably something in the space of what we say for BytesLiteral
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small changes, but generally looks great!!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! (Sorry for the post-merge review; I've been out for most of the day)
@@ -278,6 +280,10 @@ impl<'db> Type<'db> { | |||
Type::Unknown | |||
} | |||
Type::BooleanLiteral(_) => Type::Unknown, | |||
Type::StringLiteral(_) => { | |||
// TODO defer to Type::Instance(<str from typeshed>).member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a somewhat annoying and pedantic nitpick from me, but: this comment isn't strictly accurate. In between Literal[<some string value>]
and str
, there's another Python type: typing.LiteralString
. Explaining exactly how LiteralString
works would take quite a few words and isn't really relevant to this PR 😄 but for now I think I'd just use this as the TODO comment instead:
// TODO defer to Type::Instance(<str from typeshed>).member | |
// TODO defer to `typing.LiteralString`/`builtins.str` | |
// methods from typeshed's stubs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the relevance to this specific TODO comment is that for many members of a string Literal
, we can infer the return type as a LiteralString
instead of str
? This is true. It doesn't seem critical to me to reflect that detail in this TODO comment, though it certainly doesn't hurt. (The other nitpick on the TODO for both this and bytes is that for some methods we can probably safely infer an actual Literal
type for the member, e.g. we can infer Literal["foo"].upper()
to be Literal["FOO"]
-- but I also didn't think we needed to mention that in the TODO.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yeah -- as I say, it's kind of a pedantic nitpick. But given the problems that mypy has had with implementing LiteralString
(well, it still doesn't actually have an implementation, because it requires rewriting a lot of assumptions across the code), I'd prefer for us to keep LiteralString
in mind from the very beginning when thinking about implementing str
and Literal[<some string value>]
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update in a follow-on PR tomorrow – and I am personally a big fan of pedantic nitpicks when it comes to stuff like this. 😅
#[salsa::interned] | ||
pub struct StringLiteralType<'db> { | ||
#[return_ref] | ||
value: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use Box<str>
here, to save a bit of space, since the data should always be immutable? Though it maybe wouldn't make much difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great call; this is probably worth doing, considering it saves an entire usize of memory, which is eight ascii characters (on a 64-bit system); it could halve the memory used for a lot of small string literal types. (Not really, if you consider the Salsa interning overhead as well. But still, not insignificant if there are lots of small string literals.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Yeah, I consistently forget that Box<str>
is an option. 👍🏼
Summary
Introduce a
StringLiteralType
with correspondingDisplay
type and a relatively basic test that the resulting representation is as expected.Note: we currently always allocate for
StringLiteral
types. This may end up being a perf issue later, at which point we may want to look at other ways of representingvalue
here, i.e. with some kind of smarter string structure which can reuse types. That is most likely to show up with e.g. concatenation.Contributes to #12701.
Test Plan
Added a test for individual strings with both single and double quotes as well as concatenated strings with both forms.