[red-knot] Combine terminal statement support with statically known branches #15817

dcreager · 2025-01-29T21:34:39Z

This example from @sharkdp shows how terminal statements can appear in statically known branches: #15676 (comment)

def _(cond: bool):
    x = "a"
    if cond:
        x = "b"
        if True:
            return

    reveal_type(x)  # revealed: "a", "b"; should be "a"

We now use visibility constraints to track reachability, which allows us to model this correctly. There are two related changes as a result:

New bindings are not assumed to be visible; they inherit the current "scope start" visibility, which effectively means that new bindings are visible if/when the current flow is reachable
When simplifying visibility constraints after branching control flow, we only simplify if none of the intervening branches included a terminal statement. That is, earlier unaffected bindings are only actually unaffected if all branches make it to the merge point.

This reverts commit 2c0d577.

dcreager

This is not currently working because of visibility constraint simplification:

def _(cond: bool):
    x = "a"
    if cond:
        x = "b"
        # ← {0}
        if True:
            return
            # ← {1}
        # ← {2}

    reveal_type(x)  # revealed: Literal["a"]

At point {1}, we're marking the x = "b" binding as non-visible (by setting a visibility constraint of ALWAYS_FALSE).

But at point {2}, after we merge the two flows back together (the then flow and the artifically inserted else flow), we simplify the result relative to point {0}. That sees that there weren't any new bindings of x, and resets the visibility of x = "b" back to what it was at point {0}, forgetting the unreachability that we just introduced.

So I think I need to skip the simplification step when a flow contains a terminal statement.

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

dcreager · 2025-01-30T19:03:12Z

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

-        // should be completely overwritten by the snapshot we're merging in. If the other snapshot
-        // is unreachable, we should return without merging.
-        if !snapshot.reachable {
+        // As an optimization, if we know statically that either of the snapshots is always


Commenting out these two if clauses is how we verify that this is truly an optimization — we should get the same results for the tests with and without it

And I verified that we can indeed remove these checks and all tests pass! I'm not seeing a detectable performance improvement in the benchmark from including these lines; perhaps that just suggests conditional terminals aren't common enough for it to show up? It definitely seems like this should be faster in cases where it does apply.

I'm not seeing a detectable performance improvement in the benchmark from including these lines

I think it might be that the merge step is now faster: if the other snapshot's reachability is ALWAYS_FALSE, then the visibility of all of its bindings should also be ALWAYS_FALSE. (I think there were cases before TDD normalization where we wouldn't be able to see that in the structure of the visibility constraint.) Merge will iterate through all of the bindings and AND their visibility constraints, but ANDing with ALWAYS_FALSE is one of the fast-path returns.

To be clear, it's a hunch — I haven't backed any of ☝️ with data!

carljm · 2025-01-30T20:08:35Z

So I think I need to skip the simplification step when a flow contains a terminal statement.

This sounds right. The simplification step is a bit of a performance hack. I think it could be eliminated if we used BDDs instead of syntax trees to represent visibility constraints (since BDDs self-simplify). But I think it should never be wrong to skip the simplification, just potentially hurt performance. Which in this case shouldn't be too bad, since it would only occur when there's a terminal statement in the branch, which won't be the common case.

codspeed-hq · 2025-01-30T21:32:45Z

CodSpeed Performance Report

Merging #15817 will degrade performances by 7.96%

_{Comparing dcreager/static-terminal (7423de2) with main (d47088c)}

Summary

❌ 1 (👁 1) regressions
✅ 31 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
👁	`red_knot_check_file[incremental]`	4.8 ms	5.2 ms	-7.96%

dcreager · 2025-02-04T21:30:10Z

Running cargo bench locally agrees with codspeed that this is a performance regression, but using hyperfine on both black and tomllib says that it's a slight performance increase!

dcreager · 2025-02-04T21:34:58Z

I'm marking this as ready for review to get 👀 on it. Just like the TDD patch that this builds on, I'm quite confused by the codspeed findings.

carljm · 2025-02-04T21:35:19Z

Running cargo bench locally agrees with codspeed that this is a performance regression, but using hyperfine on both black and tomllib says that it's a slight performance increase!

CodSpeed says it's a regression on the incremental benchmark, not the cold benchmark. Is your hyperfine testing on black and tomllib testing incremental performance (that is, make an insignificant/comment change to one file and test how fast we re-check incrementally), or cold-check performance?

Incremental regression generally suggests we are creating more Salsa-cached values that have to be revalidated on incremental re-check, or making that revalidation of cached values (rather than the actual semantic indexing and type inference itself) more expensive in some way.

dcreager · 2025-02-04T22:11:10Z

CodSpeed says it's a regression on the incremental benchmark, not the cold benchmark. Is your hyperfine testing on black and tomllib testing incremental performance (that is, make an insignificant/comment change to one file and test how fast we re-check incrementally), or cold-check performance?

hyperfine is testing cold performance, but I was seeing the regression locally on cargo bench for the cold test too

carljm

This looks great!

I think it's worth putting some time-boxed effort (on the scale of a few hours) into looking into the regression here, but I don't think it should block the PR; I don't see anything obviously inefficient here, and this is what we need in order to get the right semantics. It's a better use of optimization effort to look broadly for the best ROI than to focus narrowly on a specific regression.

crates/red_knot_python_semantic/resources/mdtest/terminal_statements.md

crates/red_knot_python_semantic/src/semantic_index/builder.rs

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

carljm · 2025-02-04T23:07:33Z

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

-        // should be completely overwritten by the snapshot we're merging in. If the other snapshot
-        // is unreachable, we should return without merging.
-        if !snapshot.reachable {
+        // As an optimization, if we know statically that either of the snapshots is always


And I verified that we can indeed remove these checks and all tests pass! I'm not seeing a detectable performance improvement in the benchmark from including these lines; perhaps that just suggests conditional terminals aren't common enough for it to show up? It definitely seems like this should be faster in cases where it does apply.

crates/red_knot_python_semantic/resources/mdtest/exception/basic.md

crates/red_knot_python_semantic/resources/mdtest/terminal_statements.md

MichaReiser · 2025-02-05T08:02:59Z

Two things I like doing when investigating performance issues are run red_knot -vvv and compare between main and my feature (e.g. by running over tomllib):

the ingredient counts printed just before existing: Are there more or fewer ingredients?
Paste the logs into a text diff tool and see how they differ (you may want to disable concurrency for this)

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

dcreager · 2025-02-05T14:30:59Z

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

-        // should be completely overwritten by the snapshot we're merging in. If the other snapshot
-        // is unreachable, we should return without merging.
-        if !snapshot.reachable {
+        // As an optimization, if we know statically that either of the snapshots is always


I'm not seeing a detectable performance improvement in the benchmark from including these lines

I think it might be that the merge step is now faster: if the other snapshot's reachability is ALWAYS_FALSE, then the visibility of all of its bindings should also be ALWAYS_FALSE. (I think there were cases before TDD normalization where we wouldn't be able to see that in the structure of the visibility constraint.) Merge will iterate through all of the bindings and AND their visibility constraints, but ANDing with ALWAYS_FALSE is one of the fast-path returns.

To be clear, it's a hunch — I haven't backed any of ☝️ with data!

crates/red_knot_python_semantic/resources/mdtest/exception/basic.md

crates/red_knot_python_semantic/resources/mdtest/terminal_statements.md

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

crates/red_knot_python_semantic/src/semantic_index/builder.rs

dcreager · 2025-02-05T14:53:26Z

crates/red_knot_python_semantic/resources/mdtest/terminal_statements.md

+## Bindings after a terminal statement are unreachable
+
+Any bindings introduced after a terminal statement are unreachable, and are considered not visible.


This is the new test case showing that bindings after a terminal statement are considered not visible. https://github.com/astral-sh/ruff/pull/15817/files#r1941996689

Note that this means we're currently implementing the "least helpful" option in #15797. (I think that's still okay for this PR, just pointing out that this will change depending on how we decide to handle unreachable code)

Makes sense. I would put a TODO on that unresolved-reference diagnostic below.

I'm a little worried about the difficulty of implementing "more useful" options for checking unreachable code, but we can leave that as a separate problem.

Makes sense. I would put a TODO on that unresolved-reference diagnostic below.

Done

I'm a little worried about the difficulty of implementing "more useful" options for checking unreachable code, but we can leave that as a separate problem.

Are you worried that this PR makes it more difficult? Or just that it's on the list of things to tackle sooner rather than later in case it requires large changes to the design?

It's not really so much that this PR makes it more difficult, just that I'm not sure how much this approach will end up having to change. I don't think it's a reason not to merge this. I am curious if you have a rough sense of how we might go about implementing "check unreachable code as if it were reachable" while still preserving (as I think we must) "unreachable branches never merge back to outer control flow". That is, fixing the TODO you just added, without having that unreachable assignment become visible in the outer flow.

I am curious if you have a rough sense of how we might go about implementing "check unreachable code as if it were reachable" while still preserving (as I think we must) "unreachable branches never merge back to outer control flow".

I'd say we'd either need to track multiple visibility constraints for each binding, or multiple "current flow states" — both being ways to represent the visibility that each binding has now, and what it would reset to at the next merge point.

But I'm also not sure that's what we'd want to implement — if we want to "check unreachable code as if it were reachable", I'm not sure that should reset at merge points. e.g. if someone inserted a return statement for debugging, I don't see a difference in UX between:

def _(cond: bool): if cond: x = 1 return reveal_type(x) # revealed: Literal[1]

and

def _(cond: bool): if cond: x = 1 return reveal_type(x) # revealed: Literal[1]

And so if we want to treat these both the same, I'd say we'd go for an option that controls the visibility of new bindings: "always true" if we want to check unreachable code as if it were reachable, and "current reachability" if not.

I'd say we'd either need to track multiple visibility constraints for each binding, or multiple "current flow states" — both being ways to represent the visibility that each binding has now, and what it would reset to at the next merge point.

Yeah makes sense.

But I'm also not sure that's what we'd want to implement — if we want to "check unreachable code as if it were reachable", I'm not sure that should reset at merge points.

Not sure either. I feel like checking unreachable code as if it were reachable is kind of an unprincipled approach that may not have a sensible and consistent semantics.

In any case, I think the semantics implemented in this PR are a good step forward, and we should go ahead with them for now.

dcreager · 2025-02-05T20:00:38Z

crates/red_knot_python_semantic/resources/mdtest/exception/basic.md

The changes in this file are easier to see with "Hide whitespace"

dcreager · 2025-02-05T20:50:48Z

Two things I like doing when investigating performance issues are run red_knot -vvv and compare between main and my feature (e.g. by running over tomllib):

the ingredient counts printed just before existing: Are there more or fewer ingredients?

For my future self, a --profile=profiling build is considered a "release" build for the purposes of our static max logging level, so you have to remove this feature to get -vvv to print out trace log messages:

ruff/crates/red_knot/Cargo.toml

Line 29 in d47088c

tracing = { workspace = true, features = ["release_max_level_debug"] }

dcreager · 2025-02-05T21:06:42Z

I think it's worth putting some time-boxed effort (on the scale of a few hours) into looking into the regression here, but I don't think it should block the PR; I don't see anything obviously inefficient here, and this is what we need in order to get the right semantics. It's a better use of optimization effort to look broadly for the best ROI than to focus narrowly on a specific regression.

After figuring out #15817 (comment), I get identical ingredient counts for main and this feature branch:

1   0.059255s TRACE red_knot Counts for entire CLI run:
1red_knot_python_semantic::semantic_index::definition::Definition         7_722        7_722        7_722
1red_knot_python_semantic::semantic_index::expression::Expression         1_150        1_150        1_150
1red_knot_python_semantic::semantic_index::symbol::ScopeId                2_322        2_322        2_322
1red_knot_python_semantic::unpack::Unpack                                    21           21           21
1ruff_db::files::File                                                        76           76           76
1ruff_db::source::SourceText                                                 20           20           20
1                                                                         total     max_live         live

MichaReiser · 2025-02-06T06:49:30Z

@dcreager can you tell me a bit more about what performance investigation you did other than comparing ingredient numbers? Did you try to compare the verbose output of watching tomllib? Did you compare two recorded profiles?

I'm asking because an 8% regression is huge, especially considering we don't even know where it's coming from. For comparison, our biggest win on the incremental benchmark is #15763, and it's only 15%. This PR "eats up" 50% of that improvement. My main worry is: It's hard to figure out the root cause today, but it will even be harder to win back this regression in the future if nothing obvious shows up in benchmarks or profiles today.

MichaReiser · 2025-02-06T07:14:43Z

I went ahead and ran knot check --watch -vvv locally over the tomllib project (after moving the files into a src directory ). I appended some whitespace at the end of the file once initial checking is complete, this should mimic our benchmark fairly closely. Here's the output that compares pre-terminal statement support with main.

One main finding: We now call symbol(__bool__) way more often.

MichaReiser · 2025-02-06T07:45:26Z

One thing I noticed is that the following stack only shows up in the new version, suggesting that UseDefMapBuilder::snapshot has to do more heap allocation because a small vec spills to the stack more often? This could make sense, considering that we're pushing now more visibility_constraints (at least, not always TRUE)

You have to select the small "peak" around second 3 or 5. Everything else is just me being slow to manually make an edit in the file

dcreager · 2025-02-06T13:33:05Z

can you tell me a bit more about what performance investigation you did other than comparing ingredient numbers? Did you try to compare the verbose output of watching tomllib? Did you compare two recorded profiles?

I ran the benchmark test under valgrind and used the crabgrind crate to only instrument the incremental type-check call. I can post the stack traces I got, but I'll have to recompile and recollect to do that. It showed some small differences in the amount of time spent in salsa internals, which is why I focused on the ingredient counts per Carl's hypothesis.

I also had added some printfs to spit out the number of visibility constraints that were created inside of each UseDefMapBuilder, and verified that those counts were the same before/after as well.

One thing I noticed is that the following stack only shows up in the new version, suggesting that UseDefMapBuilder::snapshot has to do more heap allocation because a small vec spills to the stack more often? This could make sense, considering that we're pushing now more visibility_constraints (at least, not always TRUE)

That looks like an IndexVec being cloned, not a SmallVec. We use an IndexVec to hold all of the visibility constraints that we create while building the use-def map. But per above, I did confirm that we're not creating any new visibility constraints with this PR — I was able to piggy-back on the scope_start_visibility constraint that we were already collecting. There's a SmallVec that records a visibility constraint for each binding, but that shouldn't be larger since we aren't introducing any new bindings.

Could this be sampling bias due to perf taking stack frame snapshots periodically? Incremental checking doesn't take long in absolute terms, so it seems like it might be more susceptible to that.

One main finding: We now call symbol(__bool__) way more often.

That suggests that the cause might be this change — we might be recording more complex visibility constraints for a noticeable number of bindings, which would take more time to evaluate than the AlwaysTrue that we were recording before. And those would necessarily have some kind of Expression inside of them that we would have to type-check. Though I would have thought that would show up as a noticeable increase in the time spent in VisibilityConstraints::evaluate.

MichaReiser · 2025-02-06T13:47:33Z

Thanks for the extra explanation. It does show that you spent a fair amount of time investigating! Thanks for doing that.

we might be recording more complex visibility constraints for a noticeable number of bindings, which would take more time to evaluate than the AlwaysTrue that we were recording before. And those would necessarily have some kind of Expression inside of them that we would have to type-check. T

I think that could partially explain the regression. It means that the queries evaluating visibility constraints have more dependencies and marking each dependency as "green" is a non-zero cost.

dcreager added 3 commits January 29, 2025 15:10

Add ALWAYS_FALSE

02c4798

Mark return states

2c0d577

Mark live bindings as non-visible when flow is unreachable

e198806

AlexWaygood added the red-knot Multi-file analysis & type inference label Jan 29, 2025

dcreager added 7 commits January 30, 2025 09:35

Revert "Mark return states"

23b366d

This reverts commit 2c0d577.

Handle final return statement specially

38f3d99

Customizable binding visibility

5c1918a

Reachability is now a vis constraint, not a bool

7504fb7

Fix ternary AND logic

ee6f253

These are now unresolved I guess?

6b53554

clippy

bb11cf8

dcreager marked this pull request as ready for review January 30, 2025 16:40

dcreager requested review from carljm, MichaReiser, AlexWaygood and sharkdp as code owners January 30, 2025 16:40

dcreager marked this pull request as draft January 30, 2025 16:41

dcreager added 6 commits January 30, 2025 11:55

Normalize negations of ALWAYS_{TRUE,FALSE}

a2ef702

scope_start_visibility _is_ reachability

1c42a2b

No, bindings are always visible

289c0c6

TODO for raise/else unreachability

8b0899f

Fix test failure

9cd8e68

Expected test case change

1a80f81

dcreager commented Jan 30, 2025

View reviewed changes

dcreager added 2 commits January 30, 2025 16:25

Try to skip simplification by checking scope_start_visibility

f71325c

And try via a separate always_reachable boolean

0e06012

dcreager added 2 commits January 30, 2025 16:56

Add AlwaysFalse as its own constraint variant

b3b4577

Update always_reachable on merge correctly

46b1ec2

Add xfail for RET503

c42490c

dcreager marked this pull request as ready for review February 4, 2025 21:34

dcreager changed the title ~~[red-knot] [WIP] Combine terminal statement support with statically known branches~~ [red-knot] Combine terminal statement support with statically known branches Feb 4, 2025

carljm approved these changes Feb 4, 2025

View reviewed changes

carljm reviewed Feb 4, 2025

View reviewed changes

crates/red_knot_python_semantic/resources/mdtest/exception/basic.md Outdated Show resolved Hide resolved

carljm reviewed Feb 4, 2025

View reviewed changes

crates/red_knot_python_semantic/resources/mdtest/terminal_statements.md Show resolved Hide resolved

dcreager added 4 commits February 5, 2025 09:36

Update terminal statement comment

26f842b

Add shorter example for bindings after terminal statement

e5b6c4a

Add back debug derive

00e236f

Add TODO to function symbol comment

1d93650

dcreager commented Feb 5, 2025

View reviewed changes

Wrap try examples in functions to bound reachability

ae83741

dcreager commented Feb 5, 2025

View reviewed changes

dcreager added 2 commits February 5, 2025 15:01

Spelling typo

f13a6a6

Add TODO for unreachable code example

7423de2

dcreager merged commit 0906554 into main Feb 5, 2025
21 checks passed

dcreager deleted the dcreager/static-terminal branch February 5, 2025 22:47

dcreager mentioned this pull request Feb 5, 2025

[red-knot] Add understanding of terminal statements to control-flow analysis #14014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Combine terminal statement support with statically known branches #15817

[red-knot] Combine terminal statement support with statically known branches #15817

dcreager commented Jan 29, 2025 •

edited

Loading

dcreager left a comment

dcreager Jan 30, 2025

carljm Feb 4, 2025

dcreager Feb 5, 2025

carljm commented Jan 30, 2025

codspeed-hq bot commented Jan 30, 2025 •

edited

Loading

dcreager commented Feb 4, 2025

dcreager commented Feb 4, 2025

carljm commented Feb 4, 2025

dcreager commented Feb 4, 2025

carljm left a comment

carljm Feb 4, 2025

MichaReiser commented Feb 5, 2025

dcreager Feb 5, 2025

dcreager Feb 5, 2025

carljm Feb 5, 2025

dcreager Feb 5, 2025

carljm Feb 5, 2025

dcreager Feb 5, 2025

carljm Feb 5, 2025

carljm Feb 5, 2025

dcreager Feb 5, 2025

dcreager commented Feb 5, 2025

dcreager commented Feb 5, 2025

MichaReiser commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

dcreager commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

		## Bindings after a terminal statement are unreachable

		Any bindings introduced after a terminal statement are unreachable, and are considered not visible.

[red-knot] Combine terminal statement support with statically known branches #15817

[red-knot] Combine terminal statement support with statically known branches #15817

Conversation

dcreager commented Jan 29, 2025 • edited Loading

dcreager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm commented Jan 30, 2025

codspeed-hq bot commented Jan 30, 2025 • edited Loading

Merging #15817 will degrade performances by 7.96%

Summary

Benchmarks breakdown

dcreager commented Feb 4, 2025

dcreager commented Feb 4, 2025

carljm commented Feb 4, 2025

dcreager commented Feb 4, 2025

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Feb 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcreager commented Feb 5, 2025

dcreager commented Feb 5, 2025

MichaReiser commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

dcreager commented Feb 6, 2025

MichaReiser commented Feb 6, 2025

dcreager commented Jan 29, 2025 •

edited

Loading

codspeed-hq bot commented Jan 30, 2025 •

edited

Loading