-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Support re-export conventions for stub files #16073
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #16073 will degrade performances by 4.19%Comparing Summary
Benchmarks breakdown
|
21b94d3
to
18910d9
Compare
3a5466c
to
93c9d41
Compare
lookup: SymbolLookup, | ||
scope: ScopeId<'db>, | ||
is_dunder_slots: bool, | ||
symbol_id: ScopedSymbolId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I must have overlooked this when this query was introduced. Salsa queries that have more than one argument (other than db
) aren't very performant because salsa has to intern all arguments. It's the same as
#[salsa::interned]
struct SymbolByIdArgs<'db> {
lookup: SymbolLookup,
scope: ScopeId<'db>,
is_dunder_slots: bool,
symbol_id: ScopedSymbolId,
}
#[salsa::tracked]
fn symbol_by_id(db: &dyn Db, args: SymboLByIdArgs) { ... }
I wonder if this is the reason for the perf regression.
We could try if splitting the query into a internal_symbol_by_id
and external_symbol_by_id
query reduces the perf regression (and possibly external_dunder_slots
). But we can wait with this until the PR is close to finish.
I don't there's a way to get to a single argumet function because we don't have a SymbolId
salsa ingredient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that could be. Thanks for the suggestions, I can try that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem plausible that adding this argument increases incremental re-validation cost, since now there will be two cached memos for this query instead of one, for every global symbol that is both imported and referenced inside its file.
If this is the case, I suspect that splitting into multiple queries may not help? I would expect it to result in exactly the same increase in the total number of memos that need validating, those memos will just be split across two queries instead of being all instances of the same query.
We could reconsider whether this needs to be a query at all? I think we made it one because empirically it was a perf win to do so (despite the multiple-arguments issue), but we could check whether that is still the case after this PR. (I don't think we need it to be a query for isolation reasons, because the cross-module calls to this should all already have gone through an infer_definition_types
query on the import statement.)
Another thing that could help is to check whether the target of an import is a stub file much earlier (before calling this query), and modify the semantics of the enum so that it's not "internal" vs "external" but rather "ImportFromStub" vs "Normal". That way we would only double the number of cached values for global symbols in stub files, not for global symbols in all modules. (The effect of this might be understated by our tomllib benchmark, since it's small and probably over-indexes on typeshed, relative to a checking a larger project.)
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)] | ||
pub(crate) enum SymbolLookup { | ||
Internal, | ||
External, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the internal
and external
should be an internal type only to simplify the implementation but that we otherwise use internal_global_symbol
, external_global_symbol
methods instead (internal global symbol seems weird :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of doing that but quickly realize the internal_global_symbol
name issue. For now, I just went with it to complete it and finish the implementation. I'll do some name / structural changes now with the benchmarking.
Performance notes (on-going)Tried removing the Patch:
diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 390675f32..a4c68ea63 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -126,6 +126,37 @@ fn symbol<'db>(
name: &str,
) -> Symbol<'db> {
#[salsa::tracked]
+ fn internal_symbol_by_id<'db>(
+ db: &'db dyn Db,
+ scope: ScopeId<'db>,
+ is_dunder_slots: bool,
+ symbol_id: ScopedSymbolId,
+ ) -> Symbol<'db> {
+ symbol_by_id(
+ db,
+ SymbolLookup::Internal,
+ scope,
+ is_dunder_slots,
+ symbol_id,
+ )
+ }
+
+ #[salsa::tracked]
+ fn external_symbol_by_id<'db>(
+ db: &'db dyn Db,
+ scope: ScopeId<'db>,
+ is_dunder_slots: bool,
+ symbol_id: ScopedSymbolId,
+ ) -> Symbol<'db> {
+ symbol_by_id(
+ db,
+ SymbolLookup::External,
+ scope,
+ is_dunder_slots,
+ symbol_id,
+ )
+ }
+
fn symbol_by_id<'db>(
db: &'db dyn Db,
lookup: SymbolLookup,
@@ -233,7 +264,10 @@ fn symbol<'db>(
let is_dunder_slots = name == "__slots__";
table
.symbol_id_by_name(name)
- .map(|symbol| symbol_by_id(db, lookup, scope, is_dunder_slots, symbol))
+ .map(|symbol| match lookup {
+ SymbolLookup::Internal => internal_symbol_by_id(db, scope, is_dunder_slots, symbol),
+ SymbolLookup::External => external_symbol_by_id(db, scope, is_dunder_slots, symbol),
+ })
.unwrap_or(Symbol::Unbound)
}
I compared the trace logs between RED_KNOT_PROFILE_LOG=1 RAYON_NUM_THREADS=1 red_knot check -vvv Cold: https://www.diffchecker.com/JqHRR99N/
Incremental: https://www.diffchecker.com/KqLUCdie/
/// Occurs when we found that all inputs to a memoized value are
/// up-to-date and hence the value can be re-used without
/// executing the closure.
///
/// Executes before the "re-used" value is returned.
NOTE: There's still a 4% regression on the incremental benchmark which I want to look into. |
93c9d41
to
e8ca883
Compare
Thanks for the analysis. Calling more salsa queries is probably the culprit because we pay an overhead for every salsa that requires "checking" if it's still green during the incremental benchmark. |
59a7c3f
to
1539f98
Compare
1539f98
to
86c5876
Compare
There's still a 4% regression which I want to look into but I'd find it useful to get some initial feedback on the implementation itself which is why I'm marking this as ready for review. |
@@ -4933,7 +4980,7 @@ pub(crate) mod tests { | |||
)?; | |||
|
|||
let bar = system_path_to_file(&db, "src/bar.py")?; | |||
let a = global_symbol(&db, bar, "a"); | |||
let a = global_symbol(&db, SymbolLookup::Internal, bar, "a"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the usages of global_symbol
and symbol
it does seem to me that the majority of call sites use Internal
. Requiring the parameter for all call sites makes the API significantly more verbose. I think it could be useful to expose two methods instead of making this a parameter. I'm not familiar enough with the subject to propose good names.
I'd probably use symbol
and global_symbol
for the SymbolLookup
variant that's most commonly used (which is the right default in most cases) and have a method with a more explicit name for the other variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, for clarity/ergonomic reasons. I think we can just add a new imported_symbol
function (same as global_symbol
but with external lookup).
pub(crate) enum SymbolLookup { | ||
/// Look up the symbol as seen from within the same module. | ||
Internal, | ||
/// Look up the symbol as seen from outside the module. | ||
External, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea:
pub(crate) enum SymbolResolution {
/// Resolve the symbol as seen from within the current module
Local,
/// Resolve the symbol as it would be seen when imported from another module
Imported,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to the following functions:
global_symbol
imported_global_symbol
symbol
imported_symbol
where, (1) and (3) defaults to Local
and (2) and (4) defaults to Imported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to import a non-global symbol, so I think item 4 in that table is unnecessary? It should just be the first three possibilities.
As far as the naming of this enum, I think I weakly prefer "Lookup" over "Resolution". I think for the enum values my main concern is that it's easy to confuse "internal to a scope" and "external to a scope" vs "internal to a file" and "external to a file". Scope internal/external is our existing distinction of local vs "public" symbol types; all of these functions are for "public" types, since for local types we just symbol_from_bindings
with the bindings for that particular use. So for that reason I think I would prefer being extra explicit about that part in the variant names, not just in the docstrings, e.g. SameFile
and OtherFile
? Or SameFile
and Imported
? Or FileInternal
and Imported
?
Bummer, I thought there was a reasonable chance this approach would eliminate the regression. I guess it is coming from somewhere that both approaches have in common... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this approach looks really good.
|
||
## Re-exported symbols in stub files | ||
|
||
When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...` | |
When a symbol is re-exported, importing it should not raise an error. This tests both `import ...` |
pub(crate) enum SymbolLookup { | ||
/// Look up the symbol as seen from within the same module. | ||
Internal, | ||
/// Look up the symbol as seen from outside the module. | ||
External, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to import a non-global symbol, so I think item 4 in that table is unnecessary? It should just be the first three possibilities.
As far as the naming of this enum, I think I weakly prefer "Lookup" over "Resolution". I think for the enum values my main concern is that it's easy to confuse "internal to a scope" and "external to a scope" vs "internal to a file" and "external to a file". Scope internal/external is our existing distinction of local vs "public" symbol types; all of these functions are for "public" types, since for local types we just symbol_from_bindings
with the bindings for that particular use. So for that reason I think I would prefer being extra explicit about that part in the variant names, not just in the docstrings, e.g. SameFile
and OtherFile
? Or SameFile
and Imported
? Or FileInternal
and Imported
?
lookup: SymbolLookup, | ||
scope: ScopeId<'db>, | ||
is_dunder_slots: bool, | ||
symbol_id: ScopedSymbolId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem plausible that adding this argument increases incremental re-validation cost, since now there will be two cached memos for this query instead of one, for every global symbol that is both imported and referenced inside its file.
If this is the case, I suspect that splitting into multiple queries may not help? I would expect it to result in exactly the same increase in the total number of memos that need validating, those memos will just be split across two queries instead of being all instances of the same query.
We could reconsider whether this needs to be a query at all? I think we made it one because empirically it was a perf win to do so (despite the multiple-arguments issue), but we could check whether that is still the case after this PR. (I don't think we need it to be a query for isolation reasons, because the cross-module calls to this should all already have gone through an infer_definition_types
query on the import statement.)
Another thing that could help is to check whether the target of an import is a stub file much earlier (before calling this query), and modify the semantics of the enum so that it's not "internal" vs "external" but rather "ImportFromStub" vs "Normal". That way we would only double the number of cached values for global symbols in stub files, not for global symbols in all modules. (The effect of this might be understated by our tomllib benchmark, since it's small and probably over-indexes on typeshed, relative to a checking a larger project.)
@@ -4933,7 +4980,7 @@ pub(crate) mod tests { | |||
)?; | |||
|
|||
let bar = system_path_to_file(&db, "src/bar.py")?; | |||
let a = global_symbol(&db, bar, "a"); | |||
let a = global_symbol(&db, SymbolLookup::Internal, bar, "a"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, for clarity/ergonomic reasons. I think we can just add a new imported_symbol
function (same as global_symbol
but with external lookup).
} else { | ||
Truthiness::AlwaysFalse | ||
let is_non_exported = |binding: Definition<'db>| { | ||
lookup.is_external() && !binding.is_reexported(db) && binding.in_stub(db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May not matter much, but in_stub()
is something we could check just once for the entire lookup (based on the file) and update the lookup
value we pass in here based on that, instead of re-checking in_stub
for every binding (and below for every declaration), when it will be the same for all of them. (This is related to another comment above about reducing the number of cached memos to re-validate in the incremental benchmark.)
Similarly, I don't know how smart the Rust compiler is here or whether it would make any difference, but it seems like we could define the closure function conditionally based on lookup.is_external()
, so that if it's not external the closure is a no-op, instead of re-checking lookup.is_external()
for every binding/declaration.
This is an alternative implementation to #15848.
Summary
This PR adds support for re-export conventions for imports for stub files.
How does this work?
Import
andImportFrom
definitions to indicate whether they're being exported or notSymbol
is being queried, we'll skip the definitions that are (a) coming from a stub file (b) external lookup and (c) check the re-export flag on the definitionThis implementation does not yet support
__all__
and*
imports as both are features that needs to be implemented independently.closes: #14099
closes: #15476
Test Plan
Add test cases, update existing ones if required.