Skip to content

Make -Z self-profile more efficient #58372

Closed

Description

The self-profiling feature is going to make profiling the compilers performance a lot easier. However, a recent first stab at collecting more detailed information (see #58085) still has too much overhead.

Here are some of the things that could be improved:

  • Move post-processing of the collected data out of the rustc process, as much as possible. SelfProfiler::get_results() does a lot of work for generating the statistics from the collected events. All of this should probably be moved to a separate tool that runs after profiling is done.

  • Reduce the amount of dispatch and locking that needs to be done for each event. For each event we have to get exclusive access to the profiler (RefCell/ parking_lot mutex) and then look up the event stream for the current thread in an FxHashMap. This should probably solved via thread-local data somehow.

  • Reduce the size of events. Events are quite big (32 bytes on x86_64 would be my guess). The timestamp can be reduced to 64 bits if we just measure the time from process start. The &str containing the query name can be replaced by a 4 byte tag.

  • Persist events to disk in a binary format. We should probably open a memory mapped file per thread that we write events to directly. If events don't contain pointers they can be written to disk verbatim. The post-processing tool can then convert them to something platform independent.

Some time soon we also want to record query keys per event. This can already be done efficiently by storing the 32 bit DepNodeIndex that corresponds to a query (which also obviates the need to store the query name in each event). However, in order for the DepNodeIndex to be useful, we'll need to create a persist a mapping of DepNodeIndex -> String at some point before the tcx is destroyed (i.e. in the middle of the compilation process). I expect that creating this map will not be entirely cheap :/

@Mark-Simulacrum, as you can see the whole workflow around self-profiling will change quite a bit, so I
think it's too early to add infrastructure for it to perf.rlo just yet.

cc @wesleywiser @nnethercote (and @rust-lang/wg-compiler-performance for good measure)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-compiletimeIssue: Problems and improvements with respect to compile times.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions