Skip to content

Fix Snapshots Recording Incorrect Max. Segment Counts #74291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2021

Conversation

original-brownbear
Copy link
Contributor

If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes #74249

I'm not the biggest fan of my test here to be honest, but I couldn't find a way to have more fine grained control over background merges so I went with this rather brute force approach (alternative test suggestions very welcome :)). The test does however reproduce the problem reliable in 100% of runs for me.

If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes elastic#74249
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

Could we set index.merge.enabled: false to block merges for the first snapshot, and then remove that setting (close & reopen the index) for the second snapshot. Might need to force-merge after reopening to trigger a merge, but that won't update the force-merge ID if maxNumSegments ≤ 0.

@original-brownbear
Copy link
Contributor Author

Could we set index.merge.enabled: false to block merges for the first snapshot,

Done, so much cuter :) Thanks!

final int rounds = scaledRandomIntBetween(50, 300);
for (int i = 0; i < rounds; ++i) {
final int numDocs = scaledRandomIntBetween(100, 1000);
BulkRequestBuilder request = client().prepareBulk();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to refresh on each bulk to force a segment to be written:

Suggested change
BulkRequestBuilder request = client().prepareBulk();
BulkRequestBuilder request = client().prepareBulk().setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);

I suspect if we did that we wouldn't need so many rounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ right done in d629c74

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.prepareUpdateSettings(indexName)
.setSettings(
Settings.builder()
.put(MergePolicyConfig.INDEX_MERGE_POLICY_SEGMENTS_PER_TIER_SETTING.getKey(), "2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we know we did ≥3 rounds, each of which made ≥1 segment, and they're all pretty small (≤2MB) so this guarantees a merge IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ that was assumption here. Also ran a few hundred rounds of this to verify and it seems to hold 🤞

@original-brownbear
Copy link
Contributor Author

Thanks David!

@original-brownbear original-brownbear merged commit 4e19f54 into elastic:master Jun 18, 2021
@original-brownbear original-brownbear deleted the 74249 branch June 18, 2021 12:20
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 18, 2021
If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes elastic#74249
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 18, 2021
If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes elastic#74249
original-brownbear added a commit that referenced this pull request Jun 18, 2021
If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes #74249
original-brownbear added a commit that referenced this pull request Jun 18, 2021
If sequence numbers are equal across snapshots (and thus no files get written to the repository)
the segment count in the index commit is not reliable as it may have changed due to background merges.
=> fixed by always using the segment count determined from the file names in the snapshot instead

closes #74249
@original-brownbear original-brownbear restored the 74249 branch April 18, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.13.3 v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Max Segment Count in Snapshot Index Details from Get Snapshots API can be Wrong
4 participants