-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Provide better incrementality when items are changed #19837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9444cba
to
2ac8a9c
Compare
I forgot we can LRU the ast id map (which I planned to), this reduces the memory regression to 360mb (2750mb). |
14441b7
to
9138ab4
Compare
I was able to get the memory regression down to 45mb by using a u16 hash. I think this is very acceptable and this PR is ready. |
The previous run was stuck for 45 minutes, I cancelled it and reran and it completed fast... I smell a race condition. Not good. But have no idea how to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(review for the first commit only)
// After all, the block will then contain the *outer* item, so we allocate | ||
// an ID for it anyway. | ||
let mut blocks = Vec::new(); | ||
let mut curr_layer = vec![(node.clone(), None)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we undo the inlining of bdfs
? Imo this hurts readability of the code quite a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not because with the handling for block exprs it's tightly coupled with the rest of the code.
I am curious about the slow down here, I am a but surprised by that (though I have yet to look at the other commits). Regarding memory usage (for one the PR description needs updating :^): What difference does the LRU addition now make? Could you check (if its not too much work) how things change if we used a single u64 encoding instead of a u32 (doubling the size)? Imo we shouldn't LRU this query, it is a very frequently used one, and so the LRU bookkeeping probably does add some overhead. I also think it will hurt incrementality, as we might discard the result, the parse tree changes in a way that wouldnt effect the ast id map but now we need to recompute it and can't backdate it as there is no previous result. |
Incrementality won't change by LRUing the ast id map (I wouldn't do it if it would), because the IDs are not derived from it, they are derived from the ast ids stored in the item tree which is not LRUd. I will check the effect on performance and using |
So: I checked the memory and speed impact of the various configuration, and: Speed isn't affected by whether we LRU the AST ID map. Memory usage is:
When considering the impact of the LRU on |
Woah, okay size of |
556e663
to
050c544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second commit review
let (def_map, local_def_map) = module_id.local_def_map(db); | ||
Self { | ||
db, | ||
module_id, | ||
def_map, | ||
local_def_map, | ||
ast_id_map: db.ast_id_map(file_id), | ||
span_map: db.span_map(file_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a problem. We now draw a dependency edge to the span map here all the time, and the span map is currently very invalidation prone (something we still need to fix somehow). So I believe this will introduce a lot of invalidations now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would've have expected this part here to work the same as defmap collection, via the item tree. It is technically doing the same things as it, just in a delayed lazy fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, you discarded the assoc item stuff from the item tree in f8b8e0d
(#19837)
Still we should probably throw the span map in a LazyCell
or so at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will invalidate the collection of the assoc items, but not queries depending on the assoc items themselves, and only when the file is edited. So, not that bad.
Also I realized that we can probably get rid of RealSpanMap
completely; you can retrieve the span of a node/token by its nearest ancestor ast id and its offset from it. We will need to benchmark though if there is a difference to the binary search that RealSpanMap
is doing.
We can't put in a LazyCell
because it is needed for every assoc item for its attributes.
@@ -123,6 +126,8 @@ pub trait DefDatabase: InternDatabase + ExpandDatabase + SourceDatabase { | |||
id: VariantId, | |||
) -> (Arc<VariantFields>, Arc<ExpressionStoreSourceMap>); | |||
|
|||
// FIXME: Should we make this transparent? The only unstable thing in `enum_variants_with_diagnostics()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable
I believe this will also mostly resolve #16176 as we have a lot less stale IDs popping up |
Instead of simple numbering, we hash important bits, like the name of the item. This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything.
1f9cb91
to
051e4f5
Compare
Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.
I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.
We can do that and it's pretty heavy.
This benchmarks don't measure what they were supposed to, but that's a matter for another time. The reason we *add* a query to the list is because previously the attrs in the signature queries were computed directly from the item tree and now they call `db.attrs()`. The reason we don't *remove* any query is because, as I said, those benchmarks don't measure what they should
051e4f5
to
b35f140
Compare
Prior to this PR, adding a struct, enum, or impl would invalidate all trait solver queries in almost every circumstance. Trait solving, perhaps unexpectedly, is the most expensive operation in rust-analyzer. This would mean that adding a struct/enum/impl will cause features like semantic highlighting, inlay hints, or diagnostics to be fully recomputed, and therefore, make the editing experience feel sluggish. How sluggish depended on the size and complexity of the project in question. With this PR, the sluggishness should be reduced.
This pull request is divided into three parts, nicely separated into commits (reviewing commit-by-commit is recommended):
AstId
. The first commit fixes that. By hashing the important bits of the item and putting that in theAstId
, we ensure that only if e.g. the name of the item changes its ast id will be invalidated.FunctionId
etc.) are defined using item tree ids. Before this PR, item tree ids were slightly better for incrementality than ast ids, because they group by item kind and therefore changes to an item of different kind won't invalidate this item. But after the first commit, ast ids are pretty much stable while item tree ids remain unstable. That means that, e.g., adding a struct invalidates all following struct, which has even more severe consequences than those of ast ids (for example, it invalidates all impls that refer to the invalidated structs, and that in turn invalidates all trait solving queries). To fix that, the second commit switches the id types to use ast ids. The most important consequence, and the reason this is an invasive change, is that you no longer have a way to obtain the item tree data from an id (only its ast id, which can bring you to the syntax tree). This effectively makes the item tree exclusive for the def map. Since a lot of things used to retrieve their data partially or fully from the item tree, this is quite a massive change. All of these (e.g. the signature queries) are changed to retrieve their data directly from the ast.Unfortunately, this introduces a very non-trivial regression: memory usage regress in 400 megabytes (!!), from 2392mb to 2810mb. Speed also regress in eight seconds on my machine, 91.172s -> 99.477s.
Let's start with the speed regression: unfortunately, I believe it is inherent to the change, and we can't do much to improve or eliminate it. Fortunately, the regression isn't that large, and the gain in incrementalism should justify it, given that IDEs derive their power from incrementality and laziness, not raw speed.
The memory regressions are a different story. I don't think we can tolerate a 400mb regression (although, to be honest, if the choice was either that or this PR - I'm not at all sure I would prefer the memory). Fortunately, I believe a large part of it is solvable. I didn't check (I can if needed), but my believe is that there are few factors to the regression (a) attributes are now duplicated in the item tree and the
attrs()
query (they don't share anArc
anymore), and (b)ErasedFileAstId
grew fourfold, and it is used in spans, which are already our most memory-heavy creatures. The attributes duplication I have nothing to do about, but fortunately, I believe the majority is spans, and here I do have a hope. I have an idea (maybe I really should come back to that...) how to shrink spans considerably, and furthermore, making stuff like the ast id to not be carried by every span.Even after all memory improvements, some regression will stay because it's inherent to this approach. I believe we need to decide - what is more important to us, incrementality or memory usage? I will point towards incrementality, given that the lack of it can make rust-analyzer barely usable on large projects, while memory is cheap.
Closes #19829.
Closes #19821.