Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression in LibGit2RepoInvoker #1278

Merged
merged 7 commits into from
Jun 17, 2019

Conversation

derrickstolee
Copy link
Contributor

@derrickstolee derrickstolee commented Jun 13, 2019

Today, we noticed a performance regression from 1.0.19074.2 to 1.0.19130.1 (M150).

In that release, we changed the way we allocate and refresh LibGit2Repo objects. In order to allow packfile deletions, we need to drop the handles that LibGit2 keeps open during the duration of the LibGit2Repo. The change was to share one repo object and to dispose of it after 15 minutes of inactivity. However, that doesn't take into account the startup time required to initialize the data. This can be a lot worse if the user has thousands of pack-files.

To fix this, change the model. With this change, the PackfileMaintenanceStep will request that the LibGit2Repos are closed before performing the expire command, and then will reinitialize the LibGit2 repo right after. This will keep the "warmup" time as part of the initial mount or as the background operation.

If the read-object hook or a file hydration comes in while the expire command is running, a LibGit2 repo will be initialized and it may cause some .idx files to stick around until the next run. This was already the case, but then the warmup would happen during those operations. This will happen at most once a day.

  1. How did you find this was the issue?

@jrbriggs gets all the credit for realizing that the performance difference between versions only happened in calls where one used the read-object hook. If the read-object hook wasn't called, then there was no difference in the performance numbers. But there was a significant slowdown if the read-object hook was called, and could even be several seconds for a single object.

From that, I started spelunking the diff and looked specifically at what changed around the logic in InProcessMount for downloading an object. There were two candidates:

i. The interaction with the Git credential manager had a lot of changes. I checked with @jamill about his expectations of performance numbers there, even if it is misbehaving. He and @jeschu1 thought there was no reason for more than 100ms of delay in any of that code, and other side-effects would be very visible if it required more server connections. This led me to focus hard on item (ii).

ii. The LibGit2RepoInvoker was updated in this release to no longer create a large pool of repos at mount time and only dispose of them at process end. We had shipped this independently of the packfile maintenance for this very reason of being safe, but I had anticipated any problems would be in the multi-threaded access of a single libgit2 repo. The multi-threaded access has not been a problem, as far as I can tell.

  1. How did you test the performance of this change?

To measure this, I made a change like in this commit to use a Stopwatch around the two operations. This would easily point to which of the two options above were likely the cause. After installing that version locally and running git grep foo to trigger loose object downloads, I saw the timing for the first read-object hook had 2.5 seconds in the TryGetIsBlob() method, and all subsequent calls were ~2 milliseconds. This was on my local repo where I had inflated the pack dir to ~800 packs. If a user has 7,000+ then libgit2 will take even longer to prep the packfile data structures.

This performance testing also led me to double-check this PR and notice that the first run was still slow. The trick was to add the commit "LibGit2RepoInvoker: Test a real SHA1 when initializing" because using an empty SHA-1 seems to short-cut somewhere in the logic and not actually load the object store data in the LibGit2Repo. With that change, I see all of my TryGetIsBlob() calls taking 1-2 milliseconds again, even the first one after a PackfileMaintenanceStep.

Resolves #1277

To fix a performance regression with the new LibGit2RepoInvoker,
we should now always load the LibGit2Repo from the start, and keep
it around until the PackfileMaintenanceStep runs. At that time, we
can reload the repo.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee requested review from jrbriggs and jamill June 13, 2019 21:13
@jamill
Copy link
Member

jamill commented Jun 13, 2019

I think the main issue was the potential null ref exception - otherwise looks good Thanks!

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two more minor comments on the unit tests

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Copy link
Member

@kewillford kewillford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Couple of things.

  1. How did you find this was the issue?
  2. How did you test the performance of this change?

Great job tracking this down!

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
In order to force the LibGit2Repo to load the object store information
upon initialization, it does not suffice to use the empty SHA, as that
is short-cut somewhere and does not acheive the goals we want.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Contributor Author

@kewillford I liked your questions so much I put them in the PR description. Also, you led me to double-check the numbers and found that I needed another commit.

@jeschu1
Copy link
Member

jeschu1 commented Jun 14, 2019

@derrickstolee great summary and description of the problem / fix!

I notice we have several functional test failures. It looks like they may be related to left over .idx files.
Ex:
Prefetching commits and trees failed: Unable to delete C:\Repos\GVFSFunctionalTests\enlistment\3ae33ecc316b48eeb393.gvfs.gvfsCache\735ad590dc55429aa9764a55743ed1df\gitObjects\pack\prefetch-1560524212-55e96e5fe5a54c988154a4f86b32c997.idx

Could this be because we aren't disposing (i.e. the idx is still in use)?

@jamill
Copy link
Member

jamill commented Jun 14, 2019

Maybe prefetch should shutdown the libgit2 repo while it is running (and restart it when it finishes...)?

@derrickstolee
Copy link
Contributor Author

I’ll be able to get to the prefetch thing on Monday. I think this has always been a potential problem, but the change in this PR is that we preload the libgit2 data on mount time, creating handles.

Also, unmount GVFS when running tests, as the mount process has
libgit2 repos open.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
If we do not unmount, then the packs do not get deleted and we skip
writing the commit-graph due to no new packs, failing the test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Contributor Author

All tests pass with the latest updates (timing the mount/unmount with the places where we want to delete packs during prefetch).

@jamill
Copy link
Member

jamill commented Jun 17, 2019

Thanks! Latest changes look good to me

@derrickstolee derrickstolee merged commit f91c486 into microsoft:master Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LibGit2 repo startup time affects read-object hook and hydration
6 participants