Make `-Z self-profile` more efficient

The self-profiling feature is going to make profiling the compilers performance a lot easier. However, a recent first stab at collecting more detailed information (see https://github.com/rust-lang/rust/pull/58085) still has too much overhead.

Here are some of the things that could be improved:
 - [x] Move post-processing of the collected data out of the `rustc` process, as much as possible. [SelfProfiler::get_results()](https://github.com/rust-lang/rust/blob/576df31bedd35a1c7336ce7259bbe93ab662edef/src/librustc/util/profiling.rs#L333-L339) does a lot of work for generating the statistics from the collected events. All of this should probably be moved to a separate tool that runs after profiling is done.

- [x] Reduce the amount of dispatch and locking that needs to be done for each event. For each event we have to get exclusive access to the profiler (`RefCell`/ `parking_lot` mutex) and then look up the event stream for the current thread in an `FxHashMap`. This should probably solved via thread-local data somehow. 

- [x] Reduce the size of events. [Events](https://github.com/rust-lang/rust/blob/576df31bedd35a1c7336ce7259bbe93ab662edef/src/librustc/util/profiling.rs#L20-L28) are quite big (32 bytes on `x86_64` would be my guess). The timestamp can be reduced to 64 bits if we just measure the time from process start. The `&str` containing the query name can be replaced by a 4 byte tag.

- [x] Persist events to disk in a binary format. We should probably open a memory mapped file per thread that we write events to directly. If events don't contain pointers they can be written to disk verbatim. The post-processing tool can then convert them to something platform independent. 

Some time soon we also want to record query keys per event. This can already be done efficiently by storing the 32 bit `DepNodeIndex` that corresponds to a query (which also obviates the need to store the query name in each event). However, in order for the `DepNodeIndex` to be useful, we'll need to create a persist a mapping of `DepNodeIndex -> String` at some point before the `tcx` is destroyed (i.e. in the middle of the compilation process). I expect that creating this map will not be entirely cheap `:/`

@Mark-Simulacrum, as you can see the whole workflow around self-profiling will change quite a bit, so I 
think it's too early to add infrastructure for it to perf.rlo just yet. 

cc @wesleywiser @nnethercote (and @rust-lang/wg-compiler-performance for good measure)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `-Z self-profile` more efficient #58372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make -Z self-profile more efficient #58372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `-Z self-profile` more efficient #58372