Make recording of query cache hits in self-profiler much cheaper #142978
+59
−23
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Self-profile can record various types of things, some of them are not enabled, like query cache hits. Rustc currently records cache hits as "instant" measureme events, which records the thread ID, current timestamp, and constructs an individual event for each such cache hit. This is incredibly expensive, in a small hello world benchmark that just depends on serde, it makes compilation with nightly go from ~3s (with
-Zself-profile
) to ~15s (with-Zself-profile -Zself-profile-events=default,query-cache-hit
).We'd like to add query cache hits to rustc-perf (rust-lang/rustc-perf#2168), but there we only need the actualy cache hit counts, not the timestamp/thread ID metadata associated with it.
This PR changes the behavior of the
query-cache-hit
event. Instead of generating individual instant events, it simply aggregates cache hit counts per query invocation (so a combination of a query and its arguments, if I understand it correctly) using an atomic counter. At the end of the compilation session, these counts are then dumped to the self-profile log using integer events (in a similar fashion as how we record artifact sizes). I suppose that we could dedup the query invocations in rustc directly, but I don't think it's really required. In local experiments with the hello world + serde case, the query invocation records generated ~30 KiB more data in the self-profile, which was ~10% increase in this case.This changes the behavior of an existing event. But it's ofc unstable, and tbh I doubt that anyone uses this flag, when it makes compilation so much slower. I think that it will be more useful when it actually records the most useful subset of the previously gathered data (the actual query cache hit counts) with a fraction of the overhead. An alternative would be to create a new event. I used a different event kind though, so that old
analyzeme
won't choke on newly generated profilesWith this PR, the overhead of
-Zself-profile-events=default,query-cache-hit
seems to be miniscule vs just-Zself-profile
, so I also enabled query cache hits by default when self profiling is enabled.We should also modify
analyzeme
, specifically this, and make it load the integer events with query cache hit counts instead. I can do that as a follow-up, it's not required to be done in sync with this PR, and it doesn't require changes in rustc.CC @cjgillot
r? @oli-obk