[red-knot] Support re-export conventions for stub files #16073

dhruvmanila · 2025-02-10T12:00:04Z

This is an alternative implementation to #15848.

Summary

This PR adds support for re-export conventions for imports for stub files.

How does this work?

Add a new flag on the Import and ImportFrom definitions to indicate whether they're being exported or not
Add a new enum to indicate whether the symbol lookup is happening within the same file or is being queried from another file (e.g., an import statement)
When a Symbol is being queried, we'll skip the definitions that are (a) coming from a stub file (b) external lookup and (c) check the re-export flag on the definition

This implementation does not yet support __all__ and * imports as both are features that needs to be implemented independently.

closes: #14099
closes: #15476

Test Plan

Add test cases, update existing ones if required.

crates/red_knot_python_semantic/src/types.rs

codspeed-hq · 2025-02-10T12:05:13Z

CodSpeed Performance Report

Merging #16073 will degrade performances by 4.19%

_{Comparing dhruv/re-export-2 (86c5876) with main (366ae1f)}

Summary

❌ 1 regressions
✅ 31 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`red_knot_check_file[incremental]`	4.5 ms	4.7 ms	-4.19%

MichaReiser · 2025-02-11T08:13:10Z

crates/red_knot_python_semantic/src/types.rs

+        lookup: SymbolLookup,
        scope: ScopeId<'db>,
        is_dunder_slots: bool,
        symbol_id: ScopedSymbolId,


Hmm, I must have overlooked this when this query was introduced. Salsa queries that have more than one argument (other than db) aren't very performant because salsa has to intern all arguments. It's the same as

#[salsa::interned] struct SymbolByIdArgs<'db> { lookup: SymbolLookup, scope: ScopeId<'db>, is_dunder_slots: bool, symbol_id: ScopedSymbolId, } #[salsa::tracked] fn symbol_by_id(db: &dyn Db, args: SymboLByIdArgs) { ... }

I wonder if this is the reason for the perf regression.

We could try if splitting the query into a internal_symbol_by_id and external_symbol_by_id query reduces the perf regression (and possibly external_dunder_slots). But we can wait with this until the PR is close to finish.

I don't there's a way to get to a single argumet function because we don't have a SymbolId salsa ingredient

Yeah, that could be. Thanks for the suggestions, I can try that out.

It does seem plausible that adding this argument increases incremental re-validation cost, since now there will be two cached memos for this query instead of one, for every global symbol that is both imported and referenced inside its file.

If this is the case, I suspect that splitting into multiple queries may not help? I would expect it to result in exactly the same increase in the total number of memos that need validating, those memos will just be split across two queries instead of being all instances of the same query.

We could reconsider whether this needs to be a query at all? I think we made it one because empirically it was a perf win to do so (despite the multiple-arguments issue), but we could check whether that is still the case after this PR. (I don't think we need it to be a query for isolation reasons, because the cross-module calls to this should all already have gone through an infer_definition_types query on the import statement.)

Another thing that could help is to check whether the target of an import is a stub file much earlier (before calling this query), and modify the semantics of the enum so that it's not "internal" vs "external" but rather "ImportFromStub" vs "Normal". That way we would only double the number of cached values for global symbols in stub files, not for global symbols in all modules. (The effect of this might be understated by our tomllib benchmark, since it's small and probably over-indexes on typeshed, relative to a checking a larger project.)

MichaReiser · 2025-02-11T08:18:16Z

crates/red_knot_python_semantic/src/types.rs

+#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
+pub(crate) enum SymbolLookup {
+    Internal,
+    External,
+}


I wonder if the internal and external should be an internal type only to simplify the implementation but that we otherwise use internal_global_symbol, external_global_symbol methods instead (internal global symbol seems weird :)).

I thought of doing that but quickly realize the internal_global_symbol name issue. For now, I just went with it to complete it and finish the implementation. I'll do some name / structural changes now with the benchmarking.

dhruvmanila · 2025-02-11T11:49:16Z

Performance notes (on-going)

Tried removing the lookup: SymbolLookup field and instead have two salsa query but that doesn’t improve the performance (at least locally) (e8ca883 - going to check if Codspeed yields the same results).

Patch:

diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 390675f32..a4c68ea63 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -126,6 +126,37 @@ fn symbol<'db>(
     name: &str,
 ) -> Symbol<'db> {
     #[salsa::tracked]
+    fn internal_symbol_by_id<'db>(
+        db: &'db dyn Db,
+        scope: ScopeId<'db>,
+        is_dunder_slots: bool,
+        symbol_id: ScopedSymbolId,
+    ) -> Symbol<'db> {
+        symbol_by_id(
+            db,
+            SymbolLookup::Internal,
+            scope,
+            is_dunder_slots,
+            symbol_id,
+        )
+    }
+
+    #[salsa::tracked]
+    fn external_symbol_by_id<'db>(
+        db: &'db dyn Db,
+        scope: ScopeId<'db>,
+        is_dunder_slots: bool,
+        symbol_id: ScopedSymbolId,
+    ) -> Symbol<'db> {
+        symbol_by_id(
+            db,
+            SymbolLookup::External,
+            scope,
+            is_dunder_slots,
+            symbol_id,
+        )
+    }
+
     fn symbol_by_id<'db>(
         db: &'db dyn Db,
         lookup: SymbolLookup,
@@ -233,7 +264,10 @@ fn symbol<'db>(
     let is_dunder_slots = name == "__slots__";
     table
         .symbol_id_by_name(name)
-        .map(|symbol| symbol_by_id(db, lookup, scope, is_dunder_slots, symbol))
+        .map(|symbol| match lookup {
+            SymbolLookup::Internal => internal_symbol_by_id(db, scope, is_dunder_slots, symbol),
+            SymbolLookup::External => external_symbol_by_id(db, scope, is_dunder_slots, symbol),
+        })
         .unwrap_or(Symbol::Unbound)
 }

I compared the trace logs between main and re-export-2 using the debug build on tomllib with the following command (includes the now merged coarse-grained commit):

RED_KNOT_PROFILE_LOG=1 RAYON_NUM_THREADS=1 red_knot check -vvv

Cold: https://www.diffchecker.com/JqHRR99N/

There are more calls to symbol("object") compared to before
The number of ingredients are the same (as expected)
Reduced number of symbol("__getitem__") calls
Increased number of symbol_by_id salsa query

Incremental: https://www.diffchecker.com/KqLUCdie/

Similar observations as in the cold benchmark
For the incremental part, I’m seeing a lot of DidValidateMemoizedValue salsa events. As per the docs (mentioned below), is it that now the incremental benchmark is going through more number of queries as compared to on main and that a lot of time is spent on validating whether salsa can re-use the cached value or not.

    /// Occurs when we found that all inputs to a memoized value are
    /// up-to-date and hence the value can be re-used without
    /// executing the closure.
    ///
    /// Executes before the "re-used" value is returned.

NOTE: There's still a 4% regression on the incremental benchmark which I want to look into.

MichaReiser · 2025-02-11T12:42:46Z

Thanks for the analysis. Calling more salsa queries is probably the culprit because we pay an overhead for every salsa that requires "checking" if it's still green during the incremental benchmark.

…ookup" This reverts commit e8ca883.

dhruvmanila · 2025-02-12T08:25:33Z

There's still a 4% regression which I want to look into but I'd find it useful to get some initial feedback on the implementation itself which is why I'm marking this as ready for review.

MichaReiser · 2025-02-12T08:36:00Z

crates/red_knot_python_semantic/src/types.rs

@@ -4933,7 +4980,7 @@ pub(crate) mod tests {
        )?;

        let bar = system_path_to_file(&db, "src/bar.py")?;
-        let a = global_symbol(&db, bar, "a");
+        let a = global_symbol(&db, SymbolLookup::Internal, bar, "a");


Looking through the usages of global_symbol and symbol it does seem to me that the majority of call sites use Internal. Requiring the parameter for all call sites makes the API significantly more verbose. I think it could be useful to expose two methods instead of making this a parameter. I'm not familiar enough with the subject to propose good names.

I'd probably use symbol and global_symbol for the SymbolLookup variant that's most commonly used (which is the right default in most cases) and have a method with a more explicit name for the other variant.

I agree with this, for clarity/ergonomic reasons. I think we can just add a new imported_symbol function (same as global_symbol but with external lookup).

dhruvmanila · 2025-02-12T12:24:59Z

crates/red_knot_python_semantic/src/types.rs

+pub(crate) enum SymbolLookup {
+    /// Look up the symbol as seen from within the same module.
+    Internal,
+    /// Look up the symbol as seen from outside the module.
+    External,
+}


Another idea:

pub(crate) enum SymbolResolution { /// Resolve the symbol as seen from within the current module Local, /// Resolve the symbol as it would be seen when imported from another module Imported, }

This could lead to the following functions:

global_symbol

imported_global_symbol

symbol

imported_symbol

where, (1) and (3) defaults to Local and (2) and (4) defaults to Imported

It's not possible to import a non-global symbol, so I think item 4 in that table is unnecessary? It should just be the first three possibilities.

As far as the naming of this enum, I think I weakly prefer "Lookup" over "Resolution". I think for the enum values my main concern is that it's easy to confuse "internal to a scope" and "external to a scope" vs "internal to a file" and "external to a file". Scope internal/external is our existing distinction of local vs "public" symbol types; all of these functions are for "public" types, since for local types we just symbol_from_bindings with the bindings for that particular use. So for that reason I think I would prefer being extra explicit about that part in the variant names, not just in the docstrings, e.g. SameFile and OtherFile? Or SameFile and Imported? Or FileInternal and Imported?

carljm · 2025-02-12T19:24:20Z

There's still a 4% regression

Bummer, I thought there was a reasonable chance this approach would eliminate the regression. I guess it is coming from somewhere that both approaches have in common...

carljm

Nice, I think this approach looks really good.

carljm · 2025-02-12T20:46:54Z

crates/red_knot_python_semantic/resources/mdtest/import/conventions.md

+
+## Re-exported symbols in stub files
+
+When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...`


Suggested change

When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...`

When a symbol is re-exported, importing it should not raise an error. This tests both `import ...`

carljm · 2025-02-12T21:00:36Z

crates/red_knot_python_semantic/src/types.rs

+pub(crate) enum SymbolLookup {
+    /// Look up the symbol as seen from within the same module.
+    Internal,
+    /// Look up the symbol as seen from outside the module.
+    External,
+}


It's not possible to import a non-global symbol, so I think item 4 in that table is unnecessary? It should just be the first three possibilities.

As far as the naming of this enum, I think I weakly prefer "Lookup" over "Resolution". I think for the enum values my main concern is that it's easy to confuse "internal to a scope" and "external to a scope" vs "internal to a file" and "external to a file". Scope internal/external is our existing distinction of local vs "public" symbol types; all of these functions are for "public" types, since for local types we just symbol_from_bindings with the bindings for that particular use. So for that reason I think I would prefer being extra explicit about that part in the variant names, not just in the docstrings, e.g. SameFile and OtherFile? Or SameFile and Imported? Or FileInternal and Imported?

carljm · 2025-02-12T21:11:01Z

crates/red_knot_python_semantic/src/types.rs

+        lookup: SymbolLookup,
        scope: ScopeId<'db>,
        is_dunder_slots: bool,
        symbol_id: ScopedSymbolId,


It does seem plausible that adding this argument increases incremental re-validation cost, since now there will be two cached memos for this query instead of one, for every global symbol that is both imported and referenced inside its file.

If this is the case, I suspect that splitting into multiple queries may not help? I would expect it to result in exactly the same increase in the total number of memos that need validating, those memos will just be split across two queries instead of being all instances of the same query.

We could reconsider whether this needs to be a query at all? I think we made it one because empirically it was a perf win to do so (despite the multiple-arguments issue), but we could check whether that is still the case after this PR. (I don't think we need it to be a query for isolation reasons, because the cross-module calls to this should all already have gone through an infer_definition_types query on the import statement.)

Another thing that could help is to check whether the target of an import is a stub file much earlier (before calling this query), and modify the semantics of the enum so that it's not "internal" vs "external" but rather "ImportFromStub" vs "Normal". That way we would only double the number of cached values for global symbols in stub files, not for global symbols in all modules. (The effect of this might be understated by our tomllib benchmark, since it's small and probably over-indexes on typeshed, relative to a checking a larger project.)

carljm · 2025-02-12T21:33:33Z

crates/red_knot_python_semantic/src/types.rs

@@ -4933,7 +4980,7 @@ pub(crate) mod tests {
        )?;

        let bar = system_path_to_file(&db, "src/bar.py")?;
-        let a = global_symbol(&db, bar, "a");
+        let a = global_symbol(&db, SymbolLookup::Internal, bar, "a");


I agree with this, for clarity/ergonomic reasons. I think we can just add a new imported_symbol function (same as global_symbol but with external lookup).

carljm · 2025-02-12T21:41:26Z

crates/red_knot_python_semantic/src/types.rs

-    } else {
-        Truthiness::AlwaysFalse
+    let is_non_exported = |binding: Definition<'db>| {
+        lookup.is_external() && !binding.is_reexported(db) && binding.in_stub(db)


May not matter much, but in_stub() is something we could check just once for the entire lookup (based on the file) and update the lookup value we pass in here based on that, instead of re-checking in_stub for every binding (and below for every declaration), when it will be the same for all of them. (This is related to another comment above about reducing the number of cached memos to re-validate in the incremental benchmark.)

Similarly, I don't know how smart the Rust compiler is here or whether it would make any difference, but it seems like we could define the closure function conditionally based on lookup.is_external(), so that if it's not external the closure is a no-op, instead of re-checking lookup.is_external() for every binding/declaration.

dhruvmanila added the red-knot Multi-file analysis & type inference label Feb 10, 2025

dhruvmanila commented Feb 10, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

dhruvmanila force-pushed the dhruv/re-export-2 branch from 21b94d3 to 18910d9 Compare February 11, 2025 07:24

dhruvmanila changed the title ~~WIP: Implement re-export via semantic index builder~~ Implement re-export via semantic index builder Feb 11, 2025

dhruvmanila force-pushed the dhruv/re-export-2 branch 3 times, most recently from 3a5466c to 93c9d41 Compare February 11, 2025 07:49

dhruvmanila mentioned this pull request Feb 11, 2025

[red-knot] Add support for re-export conventions for imports #15848

Closed

dhruvmanila changed the title ~~Implement re-export via semantic index builder~~ [red-knot] Add support for re-export conventions for imports Feb 11, 2025

dhruvmanila changed the title ~~[red-knot] Add support for re-export conventions for imports~~ [red-knot] Support re-export conventions for stub files Feb 11, 2025

MichaReiser reviewed Feb 11, 2025

View reviewed changes

dhruvmanila force-pushed the dhruv/re-export-2 branch from 93c9d41 to e8ca883 Compare February 11, 2025 11:50

dhruvmanila added 4 commits February 12, 2025 12:30

Implement re-export via semantic index builder

5acbe45

perf: try splitting the salsa query for internal / external lookup

dafc135

Remove debug log

9e88fc4

Revert "perf: try splitting the salsa query for internal / external l…

14e1907

…ookup" This reverts commit e8ca883.

dhruvmanila force-pushed the dhruv/re-export-2 branch from 59a7c3f to 1539f98 Compare February 12, 2025 07:00

Switch conditional check

86c5876

dhruvmanila force-pushed the dhruv/re-export-2 branch from 1539f98 to 86c5876 Compare February 12, 2025 08:24

dhruvmanila marked this pull request as ready for review February 12, 2025 08:25

dhruvmanila requested review from carljm, AlexWaygood and sharkdp as code owners February 12, 2025 08:25

MichaReiser reviewed Feb 12, 2025

View reviewed changes

dhruvmanila commented Feb 12, 2025

View reviewed changes

carljm approved these changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Support re-export conventions for stub files #16073

[red-knot] Support re-export conventions for stub files #16073

dhruvmanila commented Feb 10, 2025 •

edited

Loading

codspeed-hq bot commented Feb 10, 2025 •

edited

Loading

MichaReiser Feb 11, 2025

dhruvmanila Feb 11, 2025

carljm Feb 12, 2025

MichaReiser Feb 11, 2025

dhruvmanila Feb 11, 2025

dhruvmanila commented Feb 11, 2025 •

edited

Loading

MichaReiser commented Feb 11, 2025

dhruvmanila commented Feb 12, 2025

MichaReiser Feb 12, 2025 •

edited

Loading

carljm Feb 12, 2025

dhruvmanila Feb 12, 2025

dhruvmanila Feb 12, 2025

carljm Feb 12, 2025

carljm commented Feb 12, 2025

carljm left a comment

carljm Feb 12, 2025

carljm Feb 12, 2025

carljm Feb 12, 2025

carljm Feb 12, 2025

carljm Feb 12, 2025


		## Re-exported symbols in stub files

		When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...`

	When a symbol is re-exported, imporing it should not raise an error. This tests both `import ...`
	When a symbol is re-exported, importing it should not raise an error. This tests both `import ...`

[red-knot] Support re-export conventions for stub files #16073

Are you sure you want to change the base?

[red-knot] Support re-export conventions for stub files #16073

Conversation

dhruvmanila commented Feb 10, 2025 • edited Loading

Summary

Test Plan

codspeed-hq bot commented Feb 10, 2025 • edited Loading

Merging #16073 will degrade performances by 4.19%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvmanila commented Feb 11, 2025 • edited Loading

Performance notes (on-going)

MichaReiser commented Feb 11, 2025

dhruvmanila commented Feb 12, 2025

MichaReiser Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm commented Feb 12, 2025

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvmanila commented Feb 10, 2025 •

edited

Loading

codspeed-hq bot commented Feb 10, 2025 •

edited

Loading

dhruvmanila commented Feb 11, 2025 •

edited

Loading

MichaReiser Feb 12, 2025 •

edited

Loading