You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cache-tree: integrate with sparse directory entries
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.
When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.
Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:
1. Use the full index. The index size is ~186 MB.
2. Use the sparse index. The index size is ~5.5 MB.
3. Use a commit where HEAD matches the populated set. The full index
size is ~5.3MB.
The third benchmark is included as a theoretical optimium for a
repository of the same object database.
First, a clean 'git status' improves from 3.1s to 240ms.
Benchmark #1: full index (git status)
Time (mean ± σ): 3.167 s ± 0.036 s [User: 2.006 s, System: 1.078 s]
Range (min … max): 3.100 s … 3.208 s 10 runs
Benchmark #2: sparse index (git status)
Time (mean ± σ): 239.5 ms ± 8.1 ms [User: 189.4 ms, System: 226.8 ms]
Range (min … max): 226.0 ms … 251.9 ms 13 runs
Benchmark #3: small tree (git status)
Time (mean ± σ): 195.3 ms ± 4.5 ms [User: 116.5 ms, System: 84.4 ms]
Range (min … max): 188.8 ms … 202.8 ms 15 runs
The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.
The performance numbers for 'git add .' are much closer to optimal:
Benchmark #1: full index (git add .)
Time (mean ± σ): 3.076 s ± 0.022 s [User: 2.065 s, System: 0.943 s]
Range (min … max): 3.044 s … 3.116 s 10 runs
Benchmark #2: sparse index (git add .)
Time (mean ± σ): 218.0 ms ± 6.6 ms [User: 195.7 ms, System: 206.6 ms]
Range (min … max): 209.8 ms … 228.2 ms 13 runs
Benchmark #3: small tree (git add .)
Time (mean ± σ): 217.6 ms ± 5.4 ms [User: 131.9 ms, System: 86.7 ms]
Range (min … max): 212.1 ms … 228.4 ms 14 runs
In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
0 commit comments