Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Combine terminal statement support with statically known branches #15817

Merged
merged 36 commits into from
Feb 5, 2025

Conversation

dcreager
Copy link
Member

@dcreager dcreager commented Jan 29, 2025

This example from @sharkdp shows how terminal statements can appear in statically known branches: #15676 (comment)

def _(cond: bool):
    x = "a"
    if cond:
        x = "b"
        if True:
            return

    reveal_type(x)  # revealed: "a", "b"; should be "a"

We now use visibility constraints to track reachability, which allows us to model this correctly. There are two related changes as a result:

  • New bindings are not assumed to be visible; they inherit the current "scope start" visibility, which effectively means that new bindings are visible if/when the current flow is reachable

  • When simplifying visibility constraints after branching control flow, we only simplify if none of the intervening branches included a terminal statement. That is, earlier unaffected bindings are only actually unaffected if all branches make it to the merge point.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jan 29, 2025
@dcreager dcreager marked this pull request as ready for review January 30, 2025 16:40
@dcreager dcreager marked this pull request as draft January 30, 2025 16:41
Copy link
Member Author

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not currently working because of visibility constraint simplification:

def _(cond: bool):
    x = "a"
    if cond:
        x = "b"
        # ← {0}
        if True:
            return
            # ← {1}
        # ← {2}

    reveal_type(x)  # revealed: Literal["a"]

At point {1}, we're marking the x = "b" binding as non-visible (by setting a visibility constraint of ALWAYS_FALSE).

But at point {2}, after we merge the two flows back together (the then flow and the artifically inserted else flow), we simplify the result relative to point {0}. That sees that there weren't any new bindings of x, and resets the visibility of x = "b" back to what it was at point {0}, forgetting the unreachability that we just introduced.

So I think I need to skip the simplification step when a flow contains a terminal statement.

// should be completely overwritten by the snapshot we're merging in. If the other snapshot
// is unreachable, we should return without merging.
if !snapshot.reachable {
// As an optimization, if we know statically that either of the snapshots is always
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting out these two if clauses is how we verify that this is truly an optimization — we should get the same results for the tests with and without it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I verified that we can indeed remove these checks and all tests pass! I'm not seeing a detectable performance improvement in the benchmark from including these lines; perhaps that just suggests conditional terminals aren't common enough for it to show up? It definitely seems like this should be faster in cases where it does apply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a detectable performance improvement in the benchmark from including these lines

I think it might be that the merge step is now faster: if the other snapshot's reachability is ALWAYS_FALSE, then the visibility of all of its bindings should also be ALWAYS_FALSE. (I think there were cases before TDD normalization where we wouldn't be able to see that in the structure of the visibility constraint.) Merge will iterate through all of the bindings and AND their visibility constraints, but ANDing with ALWAYS_FALSE is one of the fast-path returns.

To be clear, it's a hunch — I haven't backed any of ☝️ with data!

@carljm
Copy link
Contributor

carljm commented Jan 30, 2025

So I think I need to skip the simplification step when a flow contains a terminal statement.

This sounds right. The simplification step is a bit of a performance hack. I think it could be eliminated if we used BDDs instead of syntax trees to represent visibility constraints (since BDDs self-simplify). But I think it should never be wrong to skip the simplification, just potentially hurt performance. Which in this case shouldn't be too bad, since it would only occur when there's a terminal statement in the branch, which won't be the common case.

Copy link

codspeed-hq bot commented Jan 30, 2025

CodSpeed Performance Report

Merging #15817 will degrade performances by 7.96%

Comparing dcreager/static-terminal (7423de2) with main (d47088c)

Summary

❌ 1 (👁 1) regressions
✅ 31 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 red_knot_check_file[incremental] 4.8 ms 5.2 ms -7.96%

@dcreager
Copy link
Member Author

dcreager commented Feb 4, 2025

Running cargo bench locally agrees with codspeed that this is a performance regression, but using hyperfine on both black and tomllib says that it's a slight performance increase!

@dcreager dcreager marked this pull request as ready for review February 4, 2025 21:34
@dcreager
Copy link
Member Author

dcreager commented Feb 4, 2025

I'm marking this as ready for review to get 👀 on it. Just like the TDD patch that this builds on, I'm quite confused by the codspeed findings.

@dcreager dcreager changed the title [red-knot] [WIP] Combine terminal statement support with statically known branches [red-knot] Combine terminal statement support with statically known branches Feb 4, 2025
@carljm
Copy link
Contributor

carljm commented Feb 4, 2025

Running cargo bench locally agrees with codspeed that this is a performance regression, but using hyperfine on both black and tomllib says that it's a slight performance increase!

CodSpeed says it's a regression on the incremental benchmark, not the cold benchmark. Is your hyperfine testing on black and tomllib testing incremental performance (that is, make an insignificant/comment change to one file and test how fast we re-check incrementally), or cold-check performance?

Incremental regression generally suggests we are creating more Salsa-cached values that have to be revalidated on incremental re-check, or making that revalidation of cached values (rather than the actual semantic indexing and type inference itself) more expensive in some way.

@dcreager
Copy link
Member Author

dcreager commented Feb 4, 2025

CodSpeed says it's a regression on the incremental benchmark, not the cold benchmark. Is your hyperfine testing on black and tomllib testing incremental performance (that is, make an insignificant/comment change to one file and test how fast we re-check incrementally), or cold-check performance?

hyperfine is testing cold performance, but I was seeing the regression locally on cargo bench for the cold test too

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I think it's worth putting some time-boxed effort (on the scale of a few hours) into looking into the regression here, but I don't think it should block the PR; I don't see anything obviously inefficient here, and this is what we need in order to get the right semantics. It's a better use of optimization effort to look broadly for the best ROI than to focus narrowly on a specific regression.

// should be completely overwritten by the snapshot we're merging in. If the other snapshot
// is unreachable, we should return without merging.
if !snapshot.reachable {
// As an optimization, if we know statically that either of the snapshots is always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I verified that we can indeed remove these checks and all tests pass! I'm not seeing a detectable performance improvement in the benchmark from including these lines; perhaps that just suggests conditional terminals aren't common enough for it to show up? It definitely seems like this should be faster in cases where it does apply.

@MichaReiser
Copy link
Member

Two things I like doing when investigating performance issues are run red_knot -vvv and compare between main and my feature (e.g. by running over tomllib):

  • the ingredient counts printed just before existing: Are there more or fewer ingredients?
  • Paste the logs into a text diff tool and see how they differ (you may want to disable concurrency for this)

// should be completely overwritten by the snapshot we're merging in. If the other snapshot
// is unreachable, we should return without merging.
if !snapshot.reachable {
// As an optimization, if we know statically that either of the snapshots is always
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a detectable performance improvement in the benchmark from including these lines

I think it might be that the merge step is now faster: if the other snapshot's reachability is ALWAYS_FALSE, then the visibility of all of its bindings should also be ALWAYS_FALSE. (I think there were cases before TDD normalization where we wouldn't be able to see that in the structure of the visibility constraint.) Merge will iterate through all of the bindings and AND their visibility constraints, but ANDing with ALWAYS_FALSE is one of the fast-path returns.

To be clear, it's a hunch — I haven't backed any of ☝️ with data!

Comment on lines 644 to 646
## Bindings after a terminal statement are unreachable

Any bindings introduced after a terminal statement are unreachable, and are considered not visible.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new test case showing that bindings after a terminal statement are considered not visible. https://github.com/astral-sh/ruff/pull/15817/files#r1941996689

Note that this means we're currently implementing the "least helpful" option in #15797. (I think that's still okay for this PR, just pointing out that this will change depending on how we decide to handle unreachable code)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I would put a TODO on that unresolved-reference diagnostic below.

I'm a little worried about the difficulty of implementing "more useful" options for checking unreachable code, but we can leave that as a separate problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I would put a TODO on that unresolved-reference diagnostic below.

Done

I'm a little worried about the difficulty of implementing "more useful" options for checking unreachable code, but we can leave that as a separate problem.

Are you worried that this PR makes it more difficult? Or just that it's on the list of things to tackle sooner rather than later in case it requires large changes to the design?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really so much that this PR makes it more difficult, just that I'm not sure how much this approach will end up having to change. I don't think it's a reason not to merge this. I am curious if you have a rough sense of how we might go about implementing "check unreachable code as if it were reachable" while still preserving (as I think we must) "unreachable branches never merge back to outer control flow". That is, fixing the TODO you just added, without having that unreachable assignment become visible in the outer flow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious if you have a rough sense of how we might go about implementing "check unreachable code as if it were reachable" while still preserving (as I think we must) "unreachable branches never merge back to outer control flow".

I'd say we'd either need to track multiple visibility constraints for each binding, or multiple "current flow states" — both being ways to represent the visibility that each binding has now, and what it would reset to at the next merge point.

But I'm also not sure that's what we'd want to implement — if we want to "check unreachable code as if it were reachable", I'm not sure that should reset at merge points. e.g. if someone inserted a return statement for debugging, I don't see a difference in UX between:

def _(cond: bool):
    if cond:
        x = 1
        return
        reveal_type(x)  # revealed: Literal[1]

and

def _(cond: bool):
    if cond:
        x = 1
        return
    reveal_type(x)  # revealed: Literal[1]

And so if we want to treat these both the same, I'd say we'd go for an option that controls the visibility of new bindings: "always true" if we want to check unreachable code as if it were reachable, and "current reachability" if not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we'd either need to track multiple visibility constraints for each binding, or multiple "current flow states" — both being ways to represent the visibility that each binding has now, and what it would reset to at the next merge point.

Yeah makes sense.

But I'm also not sure that's what we'd want to implement — if we want to "check unreachable code as if it were reachable", I'm not sure that should reset at merge points.

Not sure either. I feel like checking unreachable code as if it were reachable is kind of an unprincipled approach that may not have a sensible and consistent semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I think the semantics implemented in this PR are a good step forward, and we should go ahead with them for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are easier to see with "Hide whitespace"

@dcreager
Copy link
Member Author

dcreager commented Feb 5, 2025

Two things I like doing when investigating performance issues are run red_knot -vvv and compare between main and my feature (e.g. by running over tomllib):

  • the ingredient counts printed just before existing: Are there more or fewer ingredients?

For my future self, a --profile=profiling build is considered a "release" build for the purposes of our static max logging level, so you have to remove this feature to get -vvv to print out trace log messages:

tracing = { workspace = true, features = ["release_max_level_debug"] }

@dcreager
Copy link
Member Author

dcreager commented Feb 5, 2025

I think it's worth putting some time-boxed effort (on the scale of a few hours) into looking into the regression here, but I don't think it should block the PR; I don't see anything obviously inefficient here, and this is what we need in order to get the right semantics. It's a better use of optimization effort to look broadly for the best ROI than to focus narrowly on a specific regression.

After figuring out #15817 (comment), I get identical ingredient counts for main and this feature branch:

1   0.059255s TRACE red_knot Counts for entire CLI run:
1red_knot_python_semantic::semantic_index::definition::Definition         7_722        7_722        7_722
1red_knot_python_semantic::semantic_index::expression::Expression         1_150        1_150        1_150
1red_knot_python_semantic::semantic_index::symbol::ScopeId                2_322        2_322        2_322
1red_knot_python_semantic::unpack::Unpack                                    21           21           21
1ruff_db::files::File                                                        76           76           76
1ruff_db::source::SourceText                                                 20           20           20
1                                                                         total     max_live         live

@dcreager dcreager merged commit 0906554 into main Feb 5, 2025
21 checks passed
@dcreager dcreager deleted the dcreager/static-terminal branch February 5, 2025 22:47
@MichaReiser
Copy link
Member

@dcreager can you tell me a bit more about what performance investigation you did other than comparing ingredient numbers? Did you try to compare the verbose output of watching tomllib? Did you compare two recorded profiles?

I'm asking because an 8% regression is huge, especially considering we don't even know where it's coming from. For comparison, our biggest win on the incremental benchmark is #15763, and it's only 15%. This PR "eats up" 50% of that improvement. My main worry is: It's hard to figure out the root cause today, but it will even be harder to win back this regression in the future if nothing obvious shows up in benchmarks or profiles today.

@MichaReiser
Copy link
Member

I went ahead and ran knot check --watch -vvv locally over the tomllib project (after moving the files into a src directory ). I appended some whitespace at the end of the file once initial checking is complete, this should mimic our benchmark fairly closely. Here's the output that compares pre-terminal statement support with main.

One main finding: We now call symbol(__bool__) way more often.

@MichaReiser
Copy link
Member

One thing I noticed is that the following stack only shows up in the new version, suggesting that UseDefMapBuilder::snapshot has to do more heap allocation because a small vec spills to the stack more often? This could make sense, considering that we're pushing now more visibility_constraints (at least, not always TRUE)

Screenshot 2025-02-06 at 08 41 11

You have to select the small "peak" around second 3 or 5. Everything else is just me being slow to manually make an edit in the file

@dcreager
Copy link
Member Author

dcreager commented Feb 6, 2025

can you tell me a bit more about what performance investigation you did other than comparing ingredient numbers? Did you try to compare the verbose output of watching tomllib? Did you compare two recorded profiles?

I ran the benchmark test under valgrind and used the crabgrind crate to only instrument the incremental type-check call. I can post the stack traces I got, but I'll have to recompile and recollect to do that. It showed some small differences in the amount of time spent in salsa internals, which is why I focused on the ingredient counts per Carl's hypothesis.

I also had added some printfs to spit out the number of visibility constraints that were created inside of each UseDefMapBuilder, and verified that those counts were the same before/after as well.

One thing I noticed is that the following stack only shows up in the new version, suggesting that UseDefMapBuilder::snapshot has to do more heap allocation because a small vec spills to the stack more often? This could make sense, considering that we're pushing now more visibility_constraints (at least, not always TRUE)

That looks like an IndexVec being cloned, not a SmallVec. We use an IndexVec to hold all of the visibility constraints that we create while building the use-def map. But per above, I did confirm that we're not creating any new visibility constraints with this PR — I was able to piggy-back on the scope_start_visibility constraint that we were already collecting. There's a SmallVec that records a visibility constraint for each binding, but that shouldn't be larger since we aren't introducing any new bindings.

Could this be sampling bias due to perf taking stack frame snapshots periodically? Incremental checking doesn't take long in absolute terms, so it seems like it might be more susceptible to that.

One main finding: We now call symbol(__bool__) way more often.

That suggests that the cause might be this change — we might be recording more complex visibility constraints for a noticeable number of bindings, which would take more time to evaluate than the AlwaysTrue that we were recording before. And those would necessarily have some kind of Expression inside of them that we would have to type-check. Though I would have thought that would show up as a noticeable increase in the time spent in VisibilityConstraints::evaluate.

@MichaReiser
Copy link
Member

Thanks for the extra explanation. It does show that you spent a fair amount of time investigating! Thanks for doing that.

we might be recording more complex visibility constraints for a noticeable number of bindings, which would take more time to evaluate than the AlwaysTrue that we were recording before. And those would necessarily have some kind of Expression inside of them that we would have to type-check. T

I think that could partially explain the regression. It means that the queries evaluating visibility constraints have more dependencies and marking each dependency as "green" is a non-zero cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants