Improving the performance of a 'typical' bat run

I found this cool performance analysis tool called `magic-trace` (https://github.com/janestreet/magic-trace) and used a trace recording of `bat` to try it out. I actually spotted a few things that could potentially be improved.

Recently, we made fantastic progress on reducing `bat`s startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run of `bat`, i.e. input files which are not gigantic or weird in any sense. Let's use `bat`s `Cargo.toml` as an example. I'm disabling the pager because I want to focus on `bat`. I'm deliberately *not* using `--no-config` to simulate a real world use case (my config only contains comments and `--italic-text=always`). I don't have any custom assets.

First, let's measure the full execution time using hyperfine:
```
▶ hyperfine --warmup=20 "bat --paging=never --force-colorization Cargo.toml"
Benchmark 1: bat --paging=never --force-colorization Cargo.toml
  Time (mean ± σ):       8.5 ms ±   1.8 ms    [User: 7.5 ms, System: 1.6 ms]
  Range (min … max):     4.7 ms …  14.6 ms    271 runs
```

That's pretty fast, but `cat` only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm running `bat` once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):
```
▶ bat Cargo.toml > /dev/null; magic-trace run -full-execution bat -- --paging=never --force-colorization Cargo.toml
```
The resulting `trace.fct` is attached here: [trace.zip](https://github.com/sharkdp/bat/files/11252572/trace.zip)

Now let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:

![image](https://user-images.githubusercontent.com/4209276/232537996-85a39e15-95cf-4507-bb00-5d8d4cea6b77.png)

The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):

![image](https://user-images.githubusercontent.com/4209276/232538622-457979ac-d42b-4042-823d-47d38c1bc7da.png)

Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:

![image](https://user-images.githubusercontent.com/4209276/232539639-581034b0-4be0-48e3-923a-a63578f5a887.png)

Up next, we see the first surprise. Calling `App::config` ("get the final `Config` object based on config, command-line options, etc") takes **3.5 ms**:

![image](https://user-images.githubusercontent.com/4209276/232540036-7e23de9c-0ed3-4db1-be12-596a787afc9c.png)

Zooming in closer:

![image](https://user-images.githubusercontent.com/4209276/232540981-77980841-e278-4a27-ba02-f27022940846.png)

Most of this time is being spent compiling regexes (for glob patterns) for our syntax *mappings* (not syntax highlighting!), i.e. things like this:

![image](https://user-images.githubusercontent.com/4209276/232540781-6a020863-dfe3-4a94-97cd-0c5bc52858e7.png)

This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).

Next up, we query the Git status (pretty cool that we can see `libgit2` internals here), which takes another 2 ms:

![image](https://user-images.githubusercontent.com/4209276/232542144-295502be-aac1-40f0-8587-6ec95f5132bc.png)

There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:

![image](https://user-images.githubusercontent.com/4209276/232542869-bad017b6-92b4-4c6a-b0cd-e86af3b5948b.png)

Finally, we can start printing something to the terminal:

![image](https://user-images.githubusercontent.com/4209276/232543538-dff226a7-ddae-4c3e-9f16-7e4d18a0da81.png)

Notice how the first few calls to `print_line` take a bit longer as some regexes are being compiled lazily (I suppose):

![image](https://user-images.githubusercontent.com/4209276/232543714-4d1dec72-9d12-4613-8536-75227277ac04.png)

![image](https://user-images.githubusercontent.com/4209276/232543806-37bf6599-e2e1-4733-81e6-12858f27fdf7.png)

There's also this weird part here, which I don't understand:

![image](https://user-images.githubusercontent.com/4209276/232544031-95cb9b9b-319c-408d-a63b-ab7aad2f32f9.png)


To summarize:
- magic-trace is cool
- Our `SyntaxMapping` initialization is much slower than it could be
- I think we can optimize a few things in `bat`s startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the performance of a 'typical' bat run #2545

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improving the performance of a 'typical' bat run #2545

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions