Prompt times don't match the real times in a screen recording, and zsh4humans seem to be getting quite a good advantage

It's hard to be sure how exactly zsh-bench is measuring what it claims to be measuring, by looking at its code.

I'm going to trust a screen recording as the source of truth for the prompt times in my machine. I've recorded zim and zsh4humans, 3 times each. ​Each time the steps are:
1. Sleep 1 second. In this time I type `exit` and ENTER, so it's used as input in the next steps.
2. Print the framework name. I consider the counting starts when the output of this print appears in the recording.
3. Start a new shell with `HOME` as the specific framework installation dir. I consider the counting ends when the prompt appears.

This translates to: `for f in zim zsh4humans; do repeat 3 do; sleep 1; print ${f}; HOME=${PWD:h}/${f} zsh -li; done; done`

The steps above are executed in a dir with a git repo with 1,000 directories and 10,000 files, set up using the code [here](https://github.com/romkatv/zsh-bench/blob/74db0d298132ecb2d1bd965e17ac04099bcbe376/zsh-bench#L186-L187) and [here](https://github.com/romkatv/zsh-bench/blob/74db0d298132ecb2d1bd965e17ac04099bcbe376/zsh-bench#L210-L215).

Each framework was installed with the setup script [here](https://github.com/romkatv/zsh-bench/blob/74db0d298132ecb2d1bd965e17ac04099bcbe376/configs/zim/setup) and [here](https://github.com/romkatv/zsh-bench/blob/74db0d298132ecb2d1bd965e17ac04099bcbe376/configs/zsh4humans/setup), respectively for zim and zsh4humans.

This should be enough to guarantee that the recordings are using the same scenario used by zsh-bench.

This is the screen recording:

![Kapture 2021-10-24 at 16 54 36](https://user-images.githubusercontent.com/4120606/138615929-de8f048a-6109-4e5a-9cf0-6d3038ddfd6b.gif)

This is the times extracted from checking the recording frames:

| | begin_frame | prompt_frame | frames | ms |
| -- | -- | -- | -- | -- |
| zim 1 | 161 | 169 | 9 | 300.000 |
| zim 2 | 200 | 207 | 8 | 266.667 |
| zim 3 | 238 | 246 | 9 | 300.000 |
| zsh4humans 1 | 276 | 282 | 7 | 233.333 |
| zsh4humans 2 | 316 | 321 | 6 | 200.000 |
| zsh4humans 3 | 354 | 359 | 6 | 200.000 |

The recording has 30fps, so each frame represents 33.333ms. We can consider the error of each measurement above to be +/- 33.333ms.

This is what I get when running zsh-bench on my same machine. Ran 3 times each of the mentioned frameworks:

```
❯ ./zsh-bench zim
==> setting up a container for benchmarking ...
==> benchmarking zim ...
creates_tty=0
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=115.790
first_command_lag_ms=149.910
command_lag_ms=62.535
input_lag_ms=28.830
exit_time_ms=47.530

❯ ./zsh-bench zim
==> setting up a container for benchmarking ...
==> benchmarking zim ...
creates_tty=0
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=196.126
first_command_lag_ms=229.765
command_lag_ms=142.110
input_lag_ms=29.402
exit_time_ms=48.146

❯ ./zsh-bench zim
==> setting up a container for benchmarking ...
==> benchmarking zim ...
creates_tty=0
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=196.878
first_command_lag_ms=231.291
command_lag_ms=140.756
input_lag_ms=27.556
exit_time_ms=48.560

❯ ./zsh-bench zsh4humans
==> setting up a container for benchmarking ...
==> benchmarking zsh4humans ...
creates_tty=1
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=30.361
first_command_lag_ms=100.323
command_lag_ms=4.997
input_lag_ms=14.536
exit_time_ms=10.895

❯ ./zsh-bench zsh4humans
==> setting up a container for benchmarking ...
==> benchmarking zsh4humans ...
creates_tty=1
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=30.458
first_command_lag_ms=101.612
command_lag_ms=5.198
input_lag_ms=12.339
exit_time_ms=11.051

❯ ./zsh-bench zsh4humans
==> setting up a container for benchmarking ...
==> benchmarking zsh4humans ...
creates_tty=1
has_compsys=1
has_syntax_highlighting=1
has_autosuggestions=1
has_git_prompt=1
first_prompt_lag_ms=29.819
first_command_lag_ms=108.122
command_lag_ms=5.123
input_lag_ms=13.146
exit_time_ms=11.05
```

### What is odd

zim first_prompt_lag_ms times in zsh-bench fluctuate quite a lot. It's an average of 169.598ms with stdev of 46.601! In the recordings the average was 288.889ms with a stdev of 19.245, which was actually a stdev of 0.577 in terms of frames (it's just one frame more in one recordings). zsh-bench should be more precise than a 30fps recording.

Also from zsh-bench's code looks like this output is the min value. It would be good to know what is the stdev of values inside a zsh-bench run, to make sure the values are not fluctuating.

I could compare the minimum recording time with the minimum zsh-bench time for each framework, but let's use the averages since the stdev of zsh-bench for zim was so high. So for completeness of information, the first_prompt_lag_ms average for zsh4humans is 30.213ms, with a stdev of 0.344. And in recordings the average was 244.444ms (stdev was 50.918, but again it was actually just one frame more in one of the recordings).

Then, it's odd that zim is 1.703x faster in zsh-bench than the recordings -- I was expecting much closer values --, and zsh4humans is 8.091x faster in zsh-bench than the recordings!

If anything, zsh-bench if favoring zsh4humans by a lot, both in terms of more stable measurements, than giving it times way faster than the real ones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prompt times don't match the real times in a screen recording, and zsh4humans seem to be getting quite a good advantage #5

What is odd

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	begin_frame	prompt_frame	frames	ms
zim 1	161	169	9	300.000
zim 2	200	207	8	266.667
zim 3	238	246	9	300.000
zsh4humans 1	276	282	7	233.333
zsh4humans 2	316	321	6	200.000
zsh4humans 3	354	359	6	200.000

Prompt times don't match the real times in a screen recording, and zsh4humans seem to be getting quite a good advantage #5

Description

What is odd

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions