Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable most caching for packfile objects #854

Merged
merged 1 commit into from
Oct 31, 2024
Merged

Conversation

jtibshirani
Copy link
Member

By default go-git maintains an LRU cache of git objects of size 96MB. When an object's contents are loaded, it's stored as a MemoryObject in this cache. This cache is not super useful in the indexing access pattern, which accesses each file only once. And in many profiles, we see a substantial number of allocations from these memory objects.

This PR disables caching for most git objects by setting LargeObjectThreshold: 1. go-git still proactively caches packfile objects under 16KB (see smallObjectThreshold here).

Follow up to #852. This change is also gated by the ZOEKT_ENABLE_GOGIT_OPTIMIZATION feature flag.

@cla-bot cla-bot bot added the cla-signed label Oct 30, 2024
@jtibshirani
Copy link
Member Author

Here's the difference in total allocations for indexing megarepo. It's a small but consistent improvement. I also like the general strategy of being more "vanilla" with our git use (eventually moving away from go-git). This PR helps take us in that direction.

~/code/zoekt % go tool pprof -alloc_space -diff_base base.prof large1.prof
File: zoekt-git-index
Type: alloc_space
Time: Oct 29, 2024 at 12:16pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for -1.72GB, 3.51% of 49.12GB total
Dropped 246 nodes (cum <= 0.25GB)
Showing top 10 nodes out of 122
      flat  flat%   sum%        cum   cum%
   -3.17GB  6.46%  6.46%    -3.17GB  6.46%  github.com/go-git/go-git/v5/plumbing.(*MemoryObject).Write
   -2.63GB  5.35% 11.81%    -2.63GB  5.35%  github.com/sourcegraph/zoekt.(*postingsBuilder).newSearchableString
    1.53GB  3.12%  8.69%    -1.63GB  3.32%  io.copyBuffer
    1.01GB  2.07%  6.63%     1.01GB  2.07%  compress/flate.(*dictDecoder).init (inline)
    0.51GB  1.05%  5.58%     0.51GB  1.05%  github.com/go-git/go-git/v5/utils/sync.init.func2
    0.38GB  0.77%  4.81%     0.38GB  0.77%  bytes.NewReader
    0.35GB   0.7%  4.10%     0.35GB   0.7%  bufio.NewReaderSize
    0.30GB  0.61%  3.49%     0.30GB  0.61%  io.LimitReader
   -0.24GB  0.49%  3.98%    -0.36GB  0.74%  github.com/sourcegraph/go-ctags.(*ctagsProcess).Parse
    0.23GB  0.47%  3.51%     1.24GB  2.53%  compress/flate.NewReader

This change was motivated by several "Allocated Heap" profiles we see on GCP, where MemoryObject.Write dominates allocations:
Screenshot 2024-10-30 at 12 41 26 PM

@jtibshirani jtibshirani requested a review from a team October 30, 2024 19:43
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay!

@jtibshirani jtibshirani merged commit 27429a1 into main Oct 31, 2024
9 checks passed
@jtibshirani jtibshirani deleted the jtibs/large-objects branch October 31, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants