Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list virtual branches performance #4771

Merged
merged 11 commits into from
Aug 29, 2024
Merged

Conversation

Byron
Copy link
Collaborator

@Byron Byron commented Aug 27, 2024

An early PR to capture the research and first steps towards making list_virtual_branches faster by introducing gix.
Refactoring might happen here as well.

Tasks

  • list_virtual_branches for CLI
  • figure out the best target for list_virtual_branches by profiling: gitbutler_diff:workdir()
  • Can diff::workdir() be implemented without writing objects (but also, without skipping files probably?) in git2?
    • Screenshot 2024-08-28 at 08 10 15
    • It kind-of works as it's a bit faster, doesn't write objects. Also there is no control over big files, but that's more of a diff-API issue.
  • make 'big files check' faster (1.9x)
  • remove duplicate diff::workdir() call when responding to filesystem events (reuse the data instead) - makes it 800ms faster
  • fix failing tests
  • bring big-file support back to read-only diff::workdir()
  • fix E2E test - maybe it's a sign that get_workspace_head() is required?
    • it was flaky

Notes for the Reviewer

  • recalculate_everything() is about twice as fast in this PR as it is on Master. I think I compared debug with release or something. This PR is merely equally fast for responses to filesystem changes, but will write much less to the ODB in the process as it avoids index.add_all(*).
  • diff::workdir() is read-only, but a little slower as it now has to find big files to skip first (in order to be mergeable). Overall it's still an improvement, but wants gix (in follow-up)

Follow-Ups

Shortcomings of current implementation

  • Current implementation creates objects that it isn't interested in, as a side-effect of index::add(*)

Performance Results

Everything was taken from the GitLab repository, with a --release build.

workspace big files

Before

731ms with git2.

DEBUG       ┝━ should_auto_snapshot [ 778ms | 0.94% / 31.94% ] check_if_last_snapshot_older_than: 300s
DEBUG       │  ┝━ worktree_files_larger_than_limit_as_git2_ignore_rule [ 731ms | 30.01% ] worktree_dir: "/Users/byron/dev-gb/gitlab"
DEBUG       │  ┕━ branch_lines_since_snapshot [ 24.0ms | 0.98% ] head_sha: 

After

389ms for traversing 78k files, a 1.88x improvement. It's faster as it won't have to do an actual status, it just finds untracked files. Of course, ideally one day there is no need to exclude untracked large files as untracked files as virtual branches won't 'soak up' the worktree like they do now.

DEBUG       ┝━ should_auto_snapshot [ 549ms | 0.96% / 25.52% ] check_if_last_snapshot_older_than: 300s
DEBUG       │  ┝━ worktree_files_larger_than_limit_as_git2_ignore_rule [ 389ms | 18.09% ] worktree_dir: "/Users/byron/dev-gb/gitlab"
DEBUG       │  ┕━ branch_lines_since_snapshot [ 139ms | 6.47% ] head_sha: 

duplicate workdir call

Before

INFO     handle [ 2.15s | 0.00% / 100.00% ] event: ProjectFileChange(d4abe90e-4fed-4d86-bb80-048e3740f003, bar)
INFO     ┕━ recalculate_everything [ 2.15s | 1.07% / 100.00% ] paths: 1
DEBUG       ┝━ workdir [ 698ms | 32.48% ] commit_oid: 1b885e82d93fe522a7c8a0128f57c61a12d2a4de
DEBUG       ┝━ should_auto_snapshot [ 570ms | 0.89% / 26.52% ] check_if_last_snapshot_older_than: 300s
DEBUG       │  ┝━ worktree_files_larger_than_limit_as_git2_ignore_rule [ 403ms | 18.75% ] worktree_dir: "/Users/byron/dev-gb/gitlab"
DEBUG       │  ┕━ branch_lines_since_snapshot [ 148ms | 6.87% ] head_sha: 1928e73219a69ead3324c7ad3d0d5ffeb18625fb
INFO        ┕━ calculate_virtual_branches [ 858ms | 1.34% / 39.93% ]
DEBUG          ┕━ list_virtual_branches [ 829ms | 5.39% / 38.58% ]
DEBUG             ┕━ get_applied_status [ 714ms | 1.06% / 33.20% ]
DEBUG                ┕━ workdir [ 691ms | 32.14% ] commit_oid: 1b885e82d93fe522a7c8a0128f57c61a12d2a4de

After

INFO     handle [ 1.65s | 0.00% / 100.00% ] event: ProjectFileChange(d4abe90e-4fed-4d86-bb80-048e3740f003, bar)
INFO     ┕━ recalculate_everything [ 1.65s | 1.55% / 100.00% ] paths: 1
DEBUG       ┝━ workdir [ 750ms | 45.48% ] commit_oid: 1b885e82d93fe522a7c8a0128f57c61a12d2a4de
DEBUG       ┝━ should_auto_snapshot [ 580ms | 1.20% / 35.22% ] check_if_last_snapshot_older_than: 300s
DEBUG       │  ┝━ worktree_files_larger_than_limit_as_git2_ignore_rule [ 420ms | 25.48% ] worktree_dir: "/Users/byron/dev-gb/gitlab"
DEBUG       │  ┕━ branch_lines_since_snapshot [ 141ms | 8.54% ] head_sha: 1928e73219a69ead3324c7ad3d0d5ffeb18625fb
INFO        ┕━ calculate_virtual_branches [ 293ms | 1.66% / 17.75% ]
DEBUG          ┕━ list_virtual_branches_cached [ 265ms | 7.54% / 16.09% ]
DEBUG             ┕━ get_applied_status_cached [ 141ms | 8.55% ]

Final Result

On master, unfortunately, this takes about 1,9s as well, which is largely due to workdir now actually being faster there. The overhead of doing the ignore_large_files_in_diffs() call is significant, and it eats up all wins.

2024-08-28T20:38:35.471023Z  INFO handle: crates/gitbutler-watcher/src/handler.rs:57: close time.busy=1.87s time.idle=7.54µs event=ProjectFileChange(a4dfef20-d58e-47d8-984d-9f78eb012c2a, that)

However, this is the basis gix to come in and do a status much faster, which is when everything will be consistently faster once again.

INFO     handle [ 1.91s | 0.00% / 100.00% ] event: ProjectFileChange(d4abe90e-4fed-4d86-bb80-048e3740f003, bar)
INFO     ┕━ recalculate_everything [ 1.91s | 1.23% / 100.00% ] paths: 1
DEBUG       ┝━ workdir [ 1.07s | 35.45% / 55.88% ] commit_oid: 1b885e82d93fe522a7c8a0128f57c61a12d2a4de
DEBUG       │  ┕━ ignore_large_files_in_diffs [ 390ms | 20.43% ] limit_in_bytes: 50000000
DEBUG       ┝━ should_auto_snapshot [ 549ms | 0.98% / 28.78% ] check_if_last_snapshot_older_than: 300s
DEBUG       │  ┝━ ignore_large_files_in_diffs [ 396ms | 20.75% ] limit_in_bytes: 33554432
DEBUG       │  ┕━ branch_lines_since_snapshot [ 134ms | 7.05% ] head_sha: 1928e73219a69ead3324c7ad3d0d5ffeb18625fb
INFO        ┕━ calculate_virtual_branches [ 269ms | 1.25% / 14.11% ]
DEBUG          ┕━ list_virtual_branches_cached [ 245ms | 6.02% / 12.86% ]
DEBUG             ┕━ get_applied_status_cached [ 131ms | 6.84% ]

It's notable though that technically, diff::workdir is now 25% slower at least, and all that so it won't add everything to the worktree.
And I am quite aware that there are many locations that call create_wd_tree() which does exactly that, but I suppose it has to start somewhere.

Performance

  • list_virual_branches() on the GitLab repo takes 1.16s, of which 802ms are spent on the workdir diff.
  • worktree::add(*) is most time-consuming, but it's the dirwalk that really takes time. Screenshot 2024-08-27 at 15 44 07

Copy link

vercel bot commented Aug 27, 2024

@Byron is attempting to deploy a commit to the GitButler Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Aug 27, 2024
This can be useful for those who seek extra information, and while it's
dominating refresh times.
It calls `list_virtual_branches()` which kind of has the effect of a status call.
There we don't necessarily need every last bit of optimization like
one would want in release profiles, but build performance is of the essence
for developer productivity.
The worktree diff is essentially a 'git status' with a given tree,
and that can be expressed with a diff of the workdir, the index and a tree.

This comes at the expense of not being able to control which file sizes are diffed.
````
GITBUTLER_PERFORMANCE_LOG=1 LOG_LEVEL=debug pnpm tauri dev
````

Additionally, some noise was removed by turning a warning into
a trace.
…nches.

That way, the head is static which allows to re-use the worktree status across
multiple related calls.

Also, add a way to quickly get the `head_commit()` from a `git2::Repository`.

Note that this may cause breakage in the app, as now it will see the actual
head, not a computed one, but that should also help us to find places that
didn't properly update their state.
Also make sure that important `gix` crates are always compiled
with optimization for proper performance.

This is particularly useful when optimizing performance as not
everything has to be done in release mode.

However, it also causes CI to be slower as it compiles some
dependencies longer.

The problem realy is that `tauri dev` doesn't support custom profiles,
so there is only `dev` and `release`.
It's used for emission of changed files, but also for listing
branch information.

Make sure we only run the worktree-status once to save time.

This also unifies the acquisition of the `HEAD^{commit}`,
which one day will be a natural feature once `gix::Repository`
is used.
- move `gix` repository extension to where it belongs more naturally.
- make sure that `get_wd_tree()` is named more closely to what it really
  does, and use it in more places.
@Byron Byron force-pushed the git2-to-gix branch 2 times, most recently from c44d090 to c86300b Compare August 29, 2024 07:12
@Byron Byron marked this pull request as ready for review August 29, 2024 07:28
This was lost previously when switching it over to a read-only
implementation.
Implementing it with an ignore list will take time, 400ms in the GitLab
repository, but it's not slower than it was before and it is always
preferred to not dump objects into the ODB unnecessarily.
That way, the application will not run into unexpected cases.
Copy link
Member

@krlvi krlvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exciting about cleaning up that part of the code. lets pay close attention for any issues on the nightly

@Byron Byron merged commit 47b6d41 into gitbutlerapp:master Aug 29, 2024
16 of 17 checks passed
Copy link
Collaborator Author

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more notes on this merged PR, @Caleb-T-Owens and @krlvi, just for your information.

) -> Result<DiffByPathMap> {
let repository = context.repository();
let head_commit = repository.head_commit()?;
gitbutler_diff::workdir(repository, &head_commit.id())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place where we drive gitbutler_diff::workdir() with the (possibly stale) HEAD commit, and I do wonder if that's always correct.

However, I also believe that the HEAD integration commit should not be able to go stale in the first place, and would rather want to see that fixed.

It's something that isn't attempted here just yet though.

Comment on lines +27 to +29
/// This should be used to update the `gitbutler/workspace` ref with, which is usually
/// done from [`update_gitbutler_integration()`], after any of its input changes.
/// This is namely the conflicting state, or any head of the virtual branches.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the list of dependencies that I see are used here in the hopes that is raises awareness in functions that alter these inputs.
In theory, these should adjust the integration commit to match, so eventually all read-only code can rely on HEAD to be valid.

@Byron Byron deleted the git2-to-gix branch August 29, 2024 11:26
ndom91 added a commit that referenced this pull request Aug 29, 2024
This reverts commit 47b6d41, reversing
changes made to b2cdc37.
ndom91 added a commit that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants