Skip to content

Provide better incrementality when items are changed #19837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ChayimFriedman2
Copy link
Contributor

@ChayimFriedman2 ChayimFriedman2 commented May 21, 2025

Prior to this PR, adding a struct, enum, or impl would invalidate all trait solver queries in almost every circumstance. Trait solving, perhaps unexpectedly, is the most expensive operation in rust-analyzer. This would mean that adding a struct/enum/impl will cause features like semantic highlighting, inlay hints, or diagnostics to be fully recomputed, and therefore, make the editing experience feel sluggish. How sluggish depended on the size and complexity of the project in question. With this PR, the sluggishness should be reduced.


This pull request is divided into three parts, nicely separated into commits (reviewing commit-by-commit is recommended):

  1. Stabilize ast ids. Currently, adding or removing an item invalidates the ast id of everything below it in the same file. This has severe consequences, are not few things store an AstId. The first commit fixes that. By hashing the important bits of the item and putting that in the AstId, we ensure that only if e.g. the name of the item changes its ast id will be invalidated.
  2. Use ast id instead of item tree id in ids: today, our IDs (FunctionId etc.) are defined using item tree ids. Before this PR, item tree ids were slightly better for incrementality than ast ids, because they group by item kind and therefore changes to an item of different kind won't invalidate this item. But after the first commit, ast ids are pretty much stable while item tree ids remain unstable. That means that, e.g., adding a struct invalidates all following struct, which has even more severe consequences than those of ast ids (for example, it invalidates all impls that refer to the invalidated structs, and that in turn invalidates all trait solving queries). To fix that, the second commit switches the id types to use ast ids. The most important consequence, and the reason this is an invasive change, is that you no longer have a way to obtain the item tree data from an id (only its ast id, which can bring you to the syntax tree). This effectively makes the item tree exclusive for the def map. Since a lot of things used to retrieve their data partially or fully from the item tree, this is quite a massive change. All of these (e.g. the signature queries) are changed to retrieve their data directly from the ast.
  3. After the second commit, the item tree now contains a non-trivial amount of things that are unused. This is because they are not needed for the def map, and other things don't use the item tree anymore. This commit removes them. This includes fields, enum variants, and more.

Unfortunately, this introduces a very non-trivial regression: memory usage regress in 400 megabytes (!!), from 2392mb to 2810mb. Speed also regress in eight seconds on my machine, 91.172s -> 99.477s.

Let's start with the speed regression: unfortunately, I believe it is inherent to the change, and we can't do much to improve or eliminate it. Fortunately, the regression isn't that large, and the gain in incrementalism should justify it, given that IDEs derive their power from incrementality and laziness, not raw speed.

The memory regressions are a different story. I don't think we can tolerate a 400mb regression (although, to be honest, if the choice was either that or this PR - I'm not at all sure I would prefer the memory). Fortunately, I believe a large part of it is solvable. I didn't check (I can if needed), but my believe is that there are few factors to the regression (a) attributes are now duplicated in the item tree and the attrs() query (they don't share an Arc anymore), and (b) ErasedFileAstId grew fourfold, and it is used in spans, which are already our most memory-heavy creatures. The attributes duplication I have nothing to do about, but fortunately, I believe the majority is spans, and here I do have a hope. I have an idea (maybe I really should come back to that...) how to shrink spans considerably, and furthermore, making stuff like the ast id to not be carried by every span.

Even after all memory improvements, some regression will stay because it's inherent to this approach. I believe we need to decide - what is more important to us, incrementality or memory usage? I will point towards incrementality, given that the lack of it can make rust-analyzer barely usable on large projects, while memory is cheap.

Closes #19829.

Closes #19821.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 21, 2025
@ChayimFriedman2
Copy link
Contributor Author

ChayimFriedman2 commented May 21, 2025

I forgot we can LRU the ast id map (which I planned to), this reduces the memory regression to 360mb (2750mb).

@ChayimFriedman2
Copy link
Contributor Author

I was able to get the memory regression down to 45mb by using a u16 hash. I think this is very acceptable and this PR is ready.

@ChayimFriedman2
Copy link
Contributor Author

The previous run was stuck for 45 minutes, I cancelled it and reran and it completed fast... I smell a race condition. Not good. But have no idea how to debug.

Copy link
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(review for the first commit only)

// After all, the block will then contain the *outer* item, so we allocate
// an ID for it anyway.
let mut blocks = Vec::new();
let mut curr_layer = vec![(node.clone(), None)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we undo the inlining of bdfs? Imo this hurts readability of the code quite a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not because with the handling for block exprs it's tightly coupled with the rest of the code.

@Veykril
Copy link
Member

Veykril commented May 22, 2025

I am curious about the slow down here, I am a but surprised by that (though I have yet to look at the other commits).

Regarding memory usage (for one the PR description needs updating :^): What difference does the LRU addition now make? Could you check (if its not too much work) how things change if we used a single u64 encoding instead of a u32 (doubling the size)?

Imo we shouldn't LRU this query, it is a very frequently used one, and so the LRU bookkeeping probably does add some overhead. I also think it will hurt incrementality, as we might discard the result, the parse tree changes in a way that wouldnt effect the ast id map but now we need to recompute it and can't backdate it as there is no previous result.

@ChayimFriedman2
Copy link
Contributor Author

Incrementality won't change by LRUing the ast id map (I wouldn't do it if it would), because the IDs are not derived from it, they are derived from the ast ids stored in the item tree which is not LRUd.

I will check the effect on performance and using u64 later (although I believe there is no reason to use u64).

@ChayimFriedman2
Copy link
Contributor Author

So: I checked the memory and speed impact of the various configuration, and:

Speed isn't affected by whether we LRU the AST ID map.

Memory usage is:

  • Baseline (before this PR) - 2392mb
  • u16 hashes, ast_id_map with LRU (that is, the current version) - 2433mb
  • u16 hashes, non-LRU'd ast_id_map - 2480mb
  • u32 hashes, ast_id_map with LRU - 2645mb. I think this is out of question.

When considering the impact of the LRU on ast_id_map, remember that we also LRU parse which is needed for everything that needs ast_id_map.

@Veykril
Copy link
Member

Veykril commented May 28, 2025

Woah, okay size of AstId here definitely makes a big difference then

@ChayimFriedman2 ChayimFriedman2 force-pushed the stable-astid branch 2 times, most recently from 556e663 to 050c544 Compare May 29, 2025 05:12
Copy link
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second commit review

let (def_map, local_def_map) = module_id.local_def_map(db);
Self {
db,
module_id,
def_map,
local_def_map,
ast_id_map: db.ast_id_map(file_id),
span_map: db.span_map(file_id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a problem. We now draw a dependency edge to the span map here all the time, and the span map is currently very invalidation prone (something we still need to fix somehow). So I believe this will introduce a lot of invalidations now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would've have expected this part here to work the same as defmap collection, via the item tree. It is technically doing the same things as it, just in a delayed lazy fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you discarded the assoc item stuff from the item tree in f8b8e0d (#19837)

Still we should probably throw the span map in a LazyCell or so at least

Copy link
Contributor Author

@ChayimFriedman2 ChayimFriedman2 Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will invalidate the collection of the assoc items, but not queries depending on the assoc items themselves, and only when the file is edited. So, not that bad.

Also I realized that we can probably get rid of RealSpanMap completely; you can retrieve the span of a node/token by its nearest ancestor ast id and its offset from it. We will need to benchmark though if there is a difference to the binary search that RealSpanMap is doing.

We can't put in a LazyCell because it is needed for every assoc item for its attributes.

@@ -123,6 +126,8 @@ pub trait DefDatabase: InternDatabase + ExpandDatabase + SourceDatabase {
id: VariantId,
) -> (Arc<VariantFields>, Arc<ExpressionStoreSourceMap>);

// FIXME: Should we make this transparent? The only unstable thing in `enum_variants_with_diagnostics()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds reasonable

@Veykril
Copy link
Member

Veykril commented May 30, 2025

I believe this will also mostly resolve #16176 as we have a lot less stale IDs popping up

Instead of simple numbering, we hash important bits, like the name of the item.

This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything.
@ChayimFriedman2 ChayimFriedman2 force-pushed the stable-astid branch 2 times, most recently from 1f9cb91 to 051e4f5 Compare June 6, 2025 04:06
Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.
I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.
We can do that and it's pretty heavy.
This benchmarks don't measure what they were supposed to, but that's a matter for another time.

The reason we *add* a query to the list is because previously the attrs in the signature queries were computed directly from the item tree and now they call `db.attrs()`. The reason we don't *remove* any query is because, as I said, those benchmarks don't measure what they should
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider hashing nodes in AstIdMap for more stable IDs
4 participants