Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Update index settings once all shard copies of an index moves to remote enabled nodes #13253

Closed

Conversation

shourya035
Copy link
Member

Description

Added logic within the ShardStartedClusterStateTaskExecutor to apply remote based index settings and add remote path based index metadata once all shard copies of an index moves over to remote store enabled nodes. Currently this logic would only execute when the cluster is in mixed mode and there are remote enabled nodes present in the cluster.

The execute logic of this cluster state executor fetches the index names whose shards has been marked as STARTED, references the current RoutingTable from the incoming ClusterState to figure out if all shards of the index are in STARTED state and all those shard copies are in remote store enabled nodes. If so, it mutates the incoming metadata by adding the remote store based settings, which is then persisted and published to the data nodes from the cluster manager

Related Issues

Resolves: #13252

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Storage:Remote labels Apr 17, 2024
@shourya035 shourya035 self-assigned this Apr 17, 2024
Copy link
Contributor

❌ Gradle check result for 498ec5c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

maybeUpdatedState = allocationService.applyStartedShards(currentState, shardRoutingsToBeApplied);
// Run remote store migration based tasks
if (ongoingDocrepToRemoteMigration(currentState.metadata().settings())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can reuse RemoteStoreNodeService#isMigratingToRemoteStore

&& (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount(
shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode()
));
return allStartedShardsOnRemote && IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can short circuit on IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false at the start of this function .

Comment on lines 946 to 949
&& (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount(
shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode()
));
return allStartedShardsOnRemote && IndexMetadata.INDEX_REMOTE_STORE_ENABLED_SETTING.get(indexMetadata.getSettings()) == false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check of (indexRoutingTable.shardsMatchingPredicateCount(ShardRouting::started) == indexRoutingTable.shardsMatchingPredicateCount( shardRouting -> discoveryNodes.get(shardRouting.currentNodeId()).isRemoteStoreNode() is not accurate , as it is not account for started shards for checking shards on remote store . We should check just for all started shards that they are on remote node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to make sure that all the primary shards are assigned as well . There can be case when 1 shard with 0 replica is unassigned , but that copy is present on a docrep node, which is temporarily down.

* @param discoveryNodes Current set of {@link DiscoveryNodes} in the cluster
* @return {@link Tuple} with segment repository name as first element and translog repository name as second element
*/
public static Tuple<String, String> getRemoteStoreRepositoryNames(DiscoveryNodes discoveryNodes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doesn't fit well . Tuple also doesn't look the right datastructure for this . Would prefer two functions for retrieving translog repo and segment repo.

Copy link
Contributor

❌ Gradle check result for da6b91a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c11bf32: ABORTED

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4e15cf9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for da6b91a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8d9c389: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request skip-changelog Storage:Remote
Projects
Status: ✅ Done
2 participants