Description
I found this cool performance analysis tool called magic-trace
(https://github.com/janestreet/magic-trace) and used a trace recording of bat
to try it out. I actually spotted a few things that could potentially be improved.
Recently, we made fantastic progress on reducing bat
s startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run of bat
, i.e. input files which are not gigantic or weird in any sense. Let's use bat
s Cargo.toml
as an example. I'm disabling the pager because I want to focus on bat
. I'm deliberately not using --no-config
to simulate a real world use case (my config only contains comments and --italic-text=always
). I don't have any custom assets.
First, let's measure the full execution time using hyperfine:
▶ hyperfine --warmup=20 "bat --paging=never --force-colorization Cargo.toml"
Benchmark 1: bat --paging=never --force-colorization Cargo.toml
Time (mean ± σ): 8.5 ms ± 1.8 ms [User: 7.5 ms, System: 1.6 ms]
Range (min … max): 4.7 ms … 14.6 ms 271 runs
That's pretty fast, but cat
only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm running bat
once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):
▶ bat Cargo.toml > /dev/null; magic-trace run -full-execution bat -- --paging=never --force-colorization Cargo.toml
The resulting trace.fct
is attached here: trace.zip
Now let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:
The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):
Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:
Up next, we see the first surprise. Calling App::config
("get the final Config
object based on config, command-line options, etc") takes 3.5 ms:
Zooming in closer:
Most of this time is being spent compiling regexes (for glob patterns) for our syntax mappings (not syntax highlighting!), i.e. things like this:
This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).
Next up, we query the Git status (pretty cool that we can see libgit2
internals here), which takes another 2 ms:
There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:
Finally, we can start printing something to the terminal:
Notice how the first few calls to print_line
take a bit longer as some regexes are being compiled lazily (I suppose):
There's also this weird part here, which I don't understand:
To summarize:
- magic-trace is cool
- Our
SyntaxMapping
initialization is much slower than it could be - I think we can optimize a few things in
bat
s startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)