Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set b-tree as default #540

Merged
merged 3 commits into from
Feb 21, 2023
Merged

set b-tree as default #540

merged 3 commits into from
Feb 21, 2023

Conversation

stefanhengl
Copy link
Member

The use of a b-tree has proven to be successful. Webserver's heap for a production instance of Sourcegraph shrank by 45% without a noticable impact on latency. Hence we set the b-tree as default. It can be disabled by setting ZOEKT_DISABLE_BTREE=true for webserver.

At the same time we remove binarySearchNgram, because it uses more disk accesses than the b-tree and keeps a reference to the posting offsets.

Once the b-tree has proven itself over a longer period of time, I will remove the alternative code path (combinedNgramOffset) and clean up the temporary structs and interfaces I created.

Test plan:

  • the b-tree is used in production for almost 2 weeks now without issue.
  • all existing tests, except the one modified in this PR, pass.

@stefanhengl stefanhengl requested a review from a team February 20, 2023 08:20
@@ -1775,7 +1775,7 @@ func TestListRepos(t *testing.T) {
Stats: RepoStats{
Shards: 1,
Documents: 4,
IndexBytes: 300,
IndexBytes: 412,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this go up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combinedNgramOffset.SizeBytes() didn't account for header bytes (slice headers mostly), has a compressed representation for ascii, and has a lower overhead for low ngram counts (like here in the test case). An empty btreeIndex already uses 92 bytes. However, the cost amortizes once the tree fills up. Even small repos have a fulll b-tree with ~1000 leaf nodes.

@stefanhengl stefanhengl merged commit 00cd03c into main Feb 21, 2023
@stefanhengl stefanhengl deleted the sh/set-btree-as-default branch February 21, 2023 10:36
stefanhengl added a commit that referenced this pull request Mar 14, 2023
I removed the code for binary search ngrams in #540, so this check is
not needed anymore.
stefanhengl added a commit that referenced this pull request Mar 14, 2023
I removed the code for binary search ngrams in #540, so this check is
not needed anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants