Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging: tolerate process interruptions during merging #289

Merged
merged 5 commits into from
Feb 23, 2022

Conversation

stefanhengl
Copy link
Member

@stefanhengl stefanhengl commented Feb 22, 2022

Currently the call to zoekt-merge-index does not tollerate process interruptions, which can potentially lead to duplicate indexes. This could be the root cause for the duplicate indexes we see in production.

With this PR

  • merge.Merge returns a temporary compound shard (*.tmp)
  • zoekt-merge-index calls merge.Merge and is reponsible to delete the input shards and rename the temporary compound shard returned by merge.Merge.
  • zoekt-sourcegraph-indexserver just calls zoekt-merge-index and always expects a consistent state.

A positive side-effect is that both merge and explode now behave the same when called from the command line: both remove the input shards and leave the output shards behind.

Test Plan:

  • Unit tests needed only obvious adjustments.
  • I ran a local instance of indexserver and verified that merging doesn't leave any temp files behind.

Can be reviewed commit by commit.

First we delete the original shards, then we rename the temporary
compound shard to its final name.

If the process gets interupted we will always end up in a healthy state
without duplicates.
Since renaming is now taken care of by zoekt-merge-index, we can remove
the logic here. The advantage is that indexserver now can be sure that
zoekt-merge-index leave a consistent state behind.

This follows the behavior of explode, too.
@stefanhengl stefanhengl force-pushed the sh/improve-fault-tolerance-of-merging branch from e8d184f to 398bbe4 Compare February 22, 2022 14:09
@stefanhengl stefanhengl requested a review from a team February 22, 2022 15:07
@stefanhengl stefanhengl marked this pull request as ready for review February 22, 2022 15:07
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I couldn't really find an example where we didn't want to delete everything and then do a rename. Why don't we just always do that and simplify zoekt.Merge interface.

The only example where you don't seem to want to do that is when merging testdata. But can't we just copy those files first to a tmp location anyways?

@stefanhengl
Copy link
Member Author

stefanhengl commented Feb 23, 2022

The only example where you don't seem to want to do that is when merging testdata. But can't we just copy those files first to a tmp location anyways?

extactly, the interface was mostly inspired to give more flexibility during testing. I am fine with copying to temp location, too. Will update the PR.

EDIT: I misread the first time and thought your are talking about the onSuccess callback in zoekt-merge-index's merge. I removed that now, but I won't touch zoekt.Merge because it mirrors nicely what Explode does. If we want to pull the delete logic into the zoekt package we should do it for both and then do it in the same PR.

@stefanhengl stefanhengl merged commit 781cfea into master Feb 23, 2022
@stefanhengl stefanhengl deleted the sh/improve-fault-tolerance-of-merging branch February 23, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants