Skip to content

Support failure store in snapshot and restore operations #107281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 15, 2024

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Apr 9, 2024

In this PR we update the snapshot and restore implementations to include the failure store of a data stream when a data stream is being snapshot.

  • When a data stream is requested to be snapshot, then it implies all backing indices and failure stores.
  • When a data stream is being restored, then it implies all backing indices and failure stores (assuming the feature flag is also enabled)
  • When individual backing or failure store indices that have been removed are being restored they need to be manually added to the data stream. This functionality worked out of the box.

@gmarouli gmarouli added >non-issue :Data Management/Data streams Data streams and their lifecycles labels Apr 9, 2024
@gmarouli gmarouli changed the title Implement snapshot and restore for failure store data streams Support failure store in snapshot and restore operations Apr 9, 2024
@gmarouli
Copy link
Contributor Author

We believe the test failure is not related to the failure store changes but it's a result of the restoration of the match_only_text mapper. We created a test to capture the minimal conditions that cause this failure: #107514

@gmarouli gmarouli marked this pull request as ready for review April 16, 2024 10:26
@gmarouli gmarouli requested a review from a team as a code owner April 16, 2024 10:26
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@gmarouli gmarouli requested review from jbaiera and nielsbauman April 16, 2024 10:28
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that you got this to work with relatively few changes! I only left some comments on the functionality itself for now. I'll have another look at the tests some other time.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments and questions. Looking good though, frustrating thing that cluster state problem.

@@ -555,7 +566,9 @@ private static Tuple<Map<String, DataStream>, Map<String, DataStreamAlias>> getD
List<String> requestedDataStreams = filterIndices(
snapshotInfo.dataStreams(),
Stream.of(requestIndices, featureStateDataStreams).flatMap(Collection::stream).toArray(String[]::new),
IndicesOptions.fromOptions(true, true, true, true)
DataStream.isFailureStoreFeatureFlagEnabled()
? IndicesOptions.lenientExpandIncludeFailureStore()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the filterIndices function - it doesn't seem to actually make any use of whether to include failure stores or not. Could we leave this unchanged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I made it more explicit in the javadoc and only kept the IndicesOptions.lenientExpand().

return renameIndex(index, request, DataStream.BACKING_INDEX_PREFIX);
}

private static String renameFailureIndex(String index, RestoreSnapshotRequest request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the above method. I personally think it scans a little better while reading, but the duplication makes me think that it's probably better to merge all of it and control the logic based on which prefix is present (or not present for non-data streams)

I think we're also losing a check here in renameIndex() if we're restoring an index that should not be renamed. Not sure from just reviewing if that check is important right now, but it existed beforehand and makes me worry about its removal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I commented this already but I guess I didn't; the if-statements in renameBackingIndex and renameFailureIndex are already covered by the renameIndex above. So, after removing those if-statements, it probably makes even more sense to merge the methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged the methods. Let me know if it matches your expectations


RestoreSnapshotRequest request = new RestoreSnapshotRequest();

DataStream updateDataStream = RestoreService.updateDataStream(dataStream, metadata, request);

assertEquals(dataStreamName, updateDataStream.getName());
assertEquals(Collections.singletonList(updatedIndex), updateDataStream.getIndices());
assertEquals(List.of(updatedBackingIndex), updateDataStream.getIndices());
assertEquals(List.of(updatedFailureIndex), updateDataStream.getFailureIndices());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these tests work if the feature flag is disabled? I recall that we enable the flag just about everywhere but I can't remember if there are still cases where the tests run without it on (like for releases).

Copy link
Contributor Author

@gmarouli gmarouli May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the unit tests always have the feature flags enabled, isn't this the case?

@gmarouli
Copy link
Contributor Author

gmarouli commented May 8, 2024

@elasticmachine update branch

@gmarouli gmarouli requested review from jbaiera and nielsbauman May 8, 2024 13:59
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Mary!

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 14, 2024
@elasticsearchmachine elasticsearchmachine merged commit c16f7d2 into elastic:main May 15, 2024
15 checks passed
@gmarouli gmarouli deleted the failure-store-snapshots branch May 15, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants