Provide better incrementality when items are changed #19837

ChayimFriedman2 · 2025-05-21T11:56:33Z

Prior to this PR, adding a struct, enum, or impl would invalidate all trait solver queries in almost every circumstance. Trait solving, perhaps unexpectedly, is the most expensive operation in rust-analyzer. This would mean that adding a struct/enum/impl will cause features like semantic highlighting, inlay hints, or diagnostics to be fully recomputed, and therefore, make the editing experience feel sluggish. How sluggish depended on the size and complexity of the project in question. With this PR, the sluggishness should be reduced.

This pull request is divided into three parts, nicely separated into commits (reviewing commit-by-commit is recommended):

Stabilize ast ids. Currently, adding or removing an item invalidates the ast id of everything below it in the same file. This has severe consequences, are not few things store an AstId. The first commit fixes that. By hashing the important bits of the item and putting that in the AstId, we ensure that only if e.g. the name of the item changes its ast id will be invalidated.
Use ast id instead of item tree id in ids: today, our IDs (FunctionId etc.) are defined using item tree ids. Before this PR, item tree ids were slightly better for incrementality than ast ids, because they group by item kind and therefore changes to an item of different kind won't invalidate this item. But after the first commit, ast ids are pretty much stable while item tree ids remain unstable. That means that, e.g., adding a struct invalidates all following struct, which has even more severe consequences than those of ast ids (for example, it invalidates all impls that refer to the invalidated structs, and that in turn invalidates all trait solving queries). To fix that, the second commit switches the id types to use ast ids. The most important consequence, and the reason this is an invasive change, is that you no longer have a way to obtain the item tree data from an id (only its ast id, which can bring you to the syntax tree). This effectively makes the item tree exclusive for the def map. Since a lot of things used to retrieve their data partially or fully from the item tree, this is quite a massive change. All of these (e.g. the signature queries) are changed to retrieve their data directly from the ast.
After the second commit, the item tree now contains a non-trivial amount of things that are unused. This is because they are not needed for the def map, and other things don't use the item tree anymore. This commit removes them. This includes fields, enum variants, and more.

Unfortunately, this introduces a very non-trivial regression: memory usage regress in 400 megabytes (!!), from 2392mb to 2810mb. Speed also regress in eight seconds on my machine, 91.172s -> 99.477s.

Let's start with the speed regression: unfortunately, I believe it is inherent to the change, and we can't do much to improve or eliminate it. Fortunately, the regression isn't that large, and the gain in incrementalism should justify it, given that IDEs derive their power from incrementality and laziness, not raw speed.

The memory regressions are a different story. I don't think we can tolerate a 400mb regression (although, to be honest, if the choice was either that or this PR - I'm not at all sure I would prefer the memory). Fortunately, I believe a large part of it is solvable. I didn't check (I can if needed), but my believe is that there are few factors to the regression (a) attributes are now duplicated in the item tree and the attrs() query (they don't share an Arc anymore), and (b) ErasedFileAstId grew fourfold, and it is used in spans, which are already our most memory-heavy creatures. The attributes duplication I have nothing to do about, but fortunately, I believe the majority is spans, and here I do have a hope. I have an idea (maybe I really should come back to that...) how to shrink spans considerably, and furthermore, making stuff like the ast id to not be carried by every span.

Even after all memory improvements, some regression will stay because it's inherent to this approach. I believe we need to decide - what is more important to us, incrementality or memory usage? I will point towards incrementality, given that the lack of it can make rust-analyzer barely usable on large projects, while memory is cheap.

Closes #19829.

Closes #19821.

ChayimFriedman2 · 2025-05-21T12:29:03Z

I forgot we can LRU the ast id map (which I planned to), this reduces the memory regression to 360mb (2750mb).

ChayimFriedman2 · 2025-05-21T20:56:36Z

I was able to get the memory regression down to 45mb by using a u16 hash. I think this is very acceptable and this PR is ready.

ChayimFriedman2 · 2025-05-21T21:56:13Z

The previous run was stuck for 45 minutes, I cancelled it and reran and it completed fast... I smell a race condition. Not good. But have no idea how to debug.

Veykril

(review for the first commit only)

crates/span/src/ast_id.rs

crates/proc-macro-api/src/process.rs

crates/span/src/ast_id.rs

Veykril · 2025-05-22T06:42:03Z

crates/span/src/ast_id.rs

+        // After all, the block will then contain the *outer* item, so we allocate
+        // an ID for it anyway.
+        let mut blocks = Vec::new();
+        let mut curr_layer = vec![(node.clone(), None)];


can we undo the inlining of bdfs? Imo this hurts readability of the code quite a bit

I prefer not because with the handling for block exprs it's tightly coupled with the rest of the code.

crates/span/src/ast_id.rs

Veykril · 2025-05-22T06:50:42Z

I am curious about the slow down here, I am a but surprised by that (though I have yet to look at the other commits).

Regarding memory usage (for one the PR description needs updating :^): What difference does the LRU addition now make? Could you check (if its not too much work) how things change if we used a single u64 encoding instead of a u32 (doubling the size)?

Imo we shouldn't LRU this query, it is a very frequently used one, and so the LRU bookkeeping probably does add some overhead. I also think it will hurt incrementality, as we might discard the result, the parse tree changes in a way that wouldnt effect the ast id map but now we need to recompute it and can't backdate it as there is no previous result.

ChayimFriedman2 · 2025-05-22T15:18:07Z

Incrementality won't change by LRUing the ast id map (I wouldn't do it if it would), because the IDs are not derived from it, they are derived from the ast ids stored in the item tree which is not LRUd.

I will check the effect on performance and using u64 later (although I believe there is no reason to use u64).

ChayimFriedman2 · 2025-05-28T02:36:43Z

So: I checked the memory and speed impact of the various configuration, and:

Speed isn't affected by whether we LRU the AST ID map.

Memory usage is:

Baseline (before this PR) - 2392mb
u16 hashes, ast_id_map with LRU (that is, the current version) - 2433mb
u16 hashes, non-LRU'd ast_id_map - 2480mb
u32 hashes, ast_id_map with LRU - 2645mb. I think this is out of question.

When considering the impact of the LRU on ast_id_map, remember that we also LRU parse which is needed for everything that needs ast_id_map.

Veykril · 2025-05-28T14:14:03Z

Woah, okay size of AstId here definitely makes a big difference then

Veykril

second commit review

Veykril · 2025-05-30T06:09:47Z

crates/hir-def/src/nameres/assoc.rs

            def_map,
            local_def_map,
+            ast_id_map: db.ast_id_map(file_id),
+            span_map: db.span_map(file_id),


This might be a problem. We now draw a dependency edge to the span map here all the time, and the span map is currently very invalidation prone (something we still need to fix somehow). So I believe this will introduce a lot of invalidations now.

In fact, I would've have expected this part here to work the same as defmap collection, via the item tree. It is technically doing the same things as it, just in a delayed lazy fashion.

Ah I see, you discarded the assoc item stuff from the item tree in f8b8e0d (#19837)

Still we should probably throw the span map in a LazyCell or so at least

This will invalidate the collection of the assoc items, but not queries depending on the assoc items themselves, and only when the file is edited. So, not that bad.

Also I realized that we can probably get rid of RealSpanMap completely; you can retrieve the span of a node/token by its nearest ancestor ast id and its offset from it. We will need to benchmark though if there is a difference to the binary search that RealSpanMap is doing.

We can't put in a LazyCell because it is needed for every assoc item for its attributes.

Veykril · 2025-05-30T06:35:24Z

crates/hir-def/src/db.rs

        id: VariantId,
    ) -> (Arc<VariantFields>, Arc<ExpressionStoreSourceMap>);

+    // FIXME: Should we make this transparent? The only unstable thing in `enum_variants_with_diagnostics()`


sounds reasonable

crates/hir-def/src/db.rs

crates/hir-expand/src/db.rs

Veykril · 2025-05-30T06:46:48Z

I believe this will also mostly resolve #16176 as we have a lot less stale IDs popping up

Instead of simple numbering, we hash important bits, like the name of the item. This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything.

Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.

I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.

We can do that and it's pretty heavy.

Veykril · 2025-06-12T06:06:41Z

I guess we can land this now and push another nightly to get ~2 work days of pre-release testers?

ChayimFriedman2 · 2025-06-12T06:09:59Z

I guess we can land this now and push another nightly to get ~2 work days of pre-release testers?

Either that or waiting for the next Monday is fine from my side (there aren't a lot of conflicts). Although I believe there won't be major problems, but of course it's hard to predict.

Veykril · 2025-06-12T06:16:50Z

When considering the impact of the LRU on ast_id_map, remember that we also LRU parse which is needed for everything that needs ast_id_map.

Side discussion regarding this, I wonder if it would make more sense to maybe bundle the parse and ast id map queries now. There does not seem a lot of value having two queries here then right?

ChayimFriedman2 · 2025-06-12T06:21:28Z

Side discussion regarding this, I wonder if it would make more sense to maybe bundle the parse and ast id map queries now. There does not seem a lot of value having two queries here then right?

Oh that's a nice idea! The only possible problem is that some things need parse and not ast id, but I don't think this will be a problem as the query will likely already be cached.

Veykril · 2025-06-12T06:58:16Z

This regressed unknown types on self (so might be a new pattern in our codebase from this PR that we can't work with yet?)

ChayimFriedman2 · 2025-06-12T07:31:25Z

Oh this does seem to cause new errors... I'm investigating. Don't release yet.

Veykril · 2025-06-12T07:46:47Z

Don't release yet.

Too late 😬

ChayimFriedman2 · 2025-06-12T07:59:09Z

Well, I hope to resolve it quickly. If I can't we will revert.

ChayimFriedman2 · 2025-06-12T08:30:06Z

The error disappears when I run r-a with the new proc macro server. So this is either a preexisting bug with our token id server, or a new bug with the interaction between it and r-a.

Veykril · 2025-06-12T08:31:23Z

Ah I feared it might be proc-macro related. It might be a token id issue then. Are those unresolved types in the salsa macro expansions?

If it works with the newer proc-macro server releases then so be it, I don't think we need to revert it then.

ChayimFriedman2 · 2025-06-12T08:38:09Z

They resolve but with an incorrect SyntaxContext, which causes errors with hygiene.

If it works with the newer proc-macro server releases then so be it, I don't think we need to revert it then.

But no-one except nightly users will use the new proc macro server. I'm still working on diagnosing and fixing.

ChayimFriedman2 · 2025-06-12T08:46:01Z

Oh this is my fault, I did not change r-a to serialize the token tree with TokenIds...

ChayimFriedman2 · 2025-06-12T10:40:28Z

Damn. The failures at least in one file are in xshell, due to it using Span::resolved_at() which isn't supported with TokenIds...

Veykril · 2025-06-12T10:49:25Z

🤨 is it shadowing via hygiene for locals? EIther way that is acceptable breakage imo. Someone is working on adding semantic support for the proc-macro server which would properly fix this then.

ChayimFriedman2 · 2025-06-12T10:52:43Z

It's parsing a TokenStream from text that contains expression with variable names, the respans it to correct span using resolved_at().

The problem is that this is a lot of breakage, and this is how it's gonna look every time we want to make an incompatible change to the proc macro server. It seems the usage of the incompatible APIs is wider than I thought.

Veykril · 2025-06-12T10:56:15Z

The problem is that this is a lot of breakage,

Is it though? I do not think a lot of proc-macros are written in this weird way where you ignore spans when doing things and then re-spanning stuff correctly in hindsight.

Or do you have more examples of where this breaks now, do note that RustRover for example has this issue in general as well as they are forced to use the TokenId server.

Veykril · 2025-06-12T10:58:26Z

Actually wait, what changed for the RPC again? Didn't we undo the size change of AstId? Toggling back to TokenId shouldn't be necessary now should it?

ChayimFriedman2 · 2025-06-12T10:59:03Z

So yes, it appears all regressions are in invocations of xshell::cmd!().

ChayimFriedman2 · 2025-06-12T10:59:55Z

Actually wait, what changed for the RPC again? Didn't we undo the size change of AstId? Toggling back to TokenId shouldn't be necessary now should it?

Hmm... We still changed the representation of fixup and root nodes, which are relied upon by the proc macro server, but maybe it's possible to avoid that.

Veykril · 2025-06-12T11:02:41Z

Well, for one we need to undo the SpanMode change I think, I just realized this is breaking in general changing the repr of that I believe.

As for fixup, we could do a terrible swap while de/encoding the subtree? Basically swap out any use of the fixup id in the encoded subtree with the hardcoded value (if we know the server is the hardcoded one) and undo that when decoding it again. Not great but thats one way to tackle this.

ChayimFriedman2 · 2025-06-12T11:16:03Z

Well, for one we need to undo the SpanMode change I think, I just realized this is breaking in general changing the repr of that I believe.

No, I checked that it serializes/deserializes the same (for the token id mode).

As for fixup, we could do a terrible swap while de/encoding the subtree? Basically swap out any use of the fixup id in the encoded subtree with the hardcoded value (if we know the server is the hardcoded one) and undo that when decoding it again. Not great but thats one way to tackle this.

Hmm... That could work, I guess, if there is no another ast id that has the same representation.

Veykril · 2025-06-12T11:18:23Z

No, I checked that it serializes/deserializes the same (for the token id mode).

Well I meant more that if we were to re-enable the span mode, we would now need to send a different rust-analyzer span mode payload depending on the server version as the current rustup server will just exit with the new payload as it cant understand it.

Hmm... That could work, I guess, if there is no another ast id that has the same representation.

Just swap any use of the fixup id out with the harcoded ROOT, and swap and use of the previous ROOT out with the fixup id, that way a collision won't cause issues either.

ChayimFriedman2 · 2025-06-12T11:34:49Z

Well I meant more that if we were to re-enable the span mode, we would now need to send a different rust-analyzer span mode payload depending on the server version as the current rustup server will just exit with the new payload as it cant understand it.

Why would we enable the old span mode, that is the one with old fixup?

Veykril · 2025-06-12T11:38:02Z

Well, we cant enable the new r-a span mode on a server that speaks the old r-a span mode. The server has one version of the protocol implemented only. The client is responsible for adhering to what the server speaks.

ChayimFriedman2 · 2025-06-12T11:39:26Z

Yes but on such server we'll only enable the token id anyway.

Anyway I'm working on a PR to implement what you suggested.

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 21, 2025

ChayimFriedman2 force-pushed the stable-astid branch from 9444cba to 2ac8a9c Compare May 21, 2025 12:00

ChayimFriedman2 force-pushed the stable-astid branch from 14441b7 to 9138ab4 Compare May 21, 2025 20:55

Veykril reviewed May 22, 2025

View reviewed changes

ChayimFriedman2 force-pushed the stable-astid branch 2 times, most recently from 556e663 to 050c544 Compare May 29, 2025 05:12

Veykril reviewed May 30, 2025

View reviewed changes

crates/hir-expand/src/db.rs Outdated Show resolved Hide resolved

Veykril mentioned this pull request Jun 2, 2025

refactor: Remove fields from ItemTree #19829

Closed

davidbarsky mentioned this pull request Jun 3, 2025

hir-ty: add incremental tests checking for infer invalidation #19914

Merged

ChayimFriedman2 force-pushed the stable-astid branch 3 times, most recently from 051e4f5 to b35f140 Compare June 6, 2025 12:10

davidbarsky mentioned this pull request Jun 11, 2025

hir-ty: test incremental trait solving #19975

Merged

Veykril approved these changes Jun 12, 2025

View reviewed changes

ChayimFriedman2 added 5 commits June 12, 2025 08:47

Use stable AST IDs

4bcf03e

Instead of simple numbering, we hash important bits, like the name of the item. This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything.

Avoid referring to the item tree except in the def map

ed0b450

Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.

Remove most of the item tree

0a1a78c

I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.

Ignore ast id hashes in typos check

09a6647

LRU ast id map

9a1063f

We can do that and it's pretty heavy.

ChayimFriedman2 force-pushed the stable-astid branch from b35f140 to 9a1063f Compare June 12, 2025 05:51

Merged via the queue into rust-lang:master with commit c15fc9a Jun 12, 2025
14 checks passed

ChayimFriedman2 deleted the stable-astid branch June 12, 2025 06:21

Veykril mentioned this pull request Jun 12, 2025

Simplify and optimize ItemTree #19982

Merged

ChayimFriedman2 mentioned this pull request Jun 12, 2025

fix: Support spans with proc macro servers from before the ast id changes #19985

Merged

lnicola mentioned this pull request Jun 16, 2025

Move ADT fields out of the item tree #19549

Closed

Provide better incrementality when items are changed #19837

Provide better incrementality when items are changed #19837

Uh oh!

Conversation

ChayimFriedman2 commented May 21, 2025 • edited by davidbarsky Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChayimFriedman2 commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChayimFriedman2 commented May 21, 2025

Uh oh!

ChayimFriedman2 commented May 21, 2025

Uh oh!

Veykril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Veykril May 22, 2025

Choose a reason for hiding this comment

Uh oh!

ChayimFriedman2 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Veykril commented May 22, 2025

Uh oh!

ChayimFriedman2 commented May 22, 2025

Uh oh!

ChayimFriedman2 commented May 28, 2025

Uh oh!

Veykril commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Veykril left a comment

Choose a reason for hiding this comment

Uh oh!

Veykril May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Veykril May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Veykril May 30, 2025

Choose a reason for hiding this comment

Uh oh!

ChayimFriedman2 Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Veykril May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Veykril commented May 30, 2025

Uh oh!

Veykril commented Jun 12, 2025

Uh oh!

ChayimFriedman2 commented Jun 12, 2025

Uh oh!

Veykril commented Jun 12, 2025

Uh oh!

Uh oh!

ChayimFriedman2 commented Jun 12, 2025

Uh oh!

Veykril commented Jun 12, 2025

Uh oh!

ChayimFriedman2 commented Jun 12, 2025

Uh oh!

Veykril commented Jun 12, 2025

Uh oh!

ChayimFriedman2 commented Jun 12, 2025

ChayimFriedman2 commented May 21, 2025 •

edited by davidbarsky

Loading

ChayimFriedman2 commented May 21, 2025 •

edited

Loading

Veykril commented May 28, 2025 •

edited

Loading

ChayimFriedman2 Jun 6, 2025 •

edited

Loading

Veykril commented Jun 12, 2025 •

edited

Loading

Veykril commented Jun 12, 2025 •

edited

Loading

ChayimFriedman2 commented Jun 12, 2025 •

edited

Loading

Veykril commented Jun 12, 2025 •

edited

Loading