Skip to content

Enforce higher priority for RepositoriesService ClusterStateApplier #59040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jul 5, 2020

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808

@fcofdez fcofdez added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. backport Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 labels Jul 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of elastic#58808
@fcofdez fcofdez force-pushed the repositories-cluster-state-applier-priority-7.x branch from 08c239a to 60ff993 Compare July 5, 2020 11:47
@original-brownbear
Copy link
Contributor

@fcofdez looks like the relevant test that this PR adds failed on this PR. Is this a possible failure mode in master as well maybe?

@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 6, 2020

It seems like the change introduced in 3b71a31 wasn't ported to 7.x, in particular the failure is triggered by trying to modify DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING in InternalTestCluster.java while the master is still blocked waiting for a third data node to join. does that setting still apply? @ywelsch @original-brownbear

@DaveCTurner
Copy link
Contributor

Indeed #39466 was a master-only change since we must still support Zen1 in 7.x for the purposes of rolling upgrades from 6.x. It seems strange that the cluster hasn't recovered by this point, however; I'll dig deeper.

@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 6, 2020

Thanks for the clarification @DaveCTurner. In the test I was forcing the recovery to be held until a third data node joins the cluster, the reason was to avoid race conditions around the registration of the cluster state listener that collects the data to do the assertions. So I think that the behavior is kind of expected, I'm not sure if we have a different way to avoid that possible race condition in 7.x without forcing the recovery to be postponed until an additional node joins.

@DaveCTurner
Copy link
Contributor

I see, sorry for the delay, it's taken me over an hour to shave all the yaks needed to run this darn test (required an IntelliJ upgrade). I understand now.

I think you can avoid these checks with autoManageMasterNodes:

diff --git a/x-pack/plugin/searchable-snapshots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java b/x-pack/plugin/searchable-snapshots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java
index 8277714a3ec..3a21761fcfb 100644
--- a/x-pack/plugin/searchable-snapshots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java
+++ b/x-pack/plugin/searchable-snapshots/src/test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java
@@ -35,10 +35,13 @@ import static org.hamcrest.Matchers.equalTo;
 import static org.hamcrest.Matchers.greaterThan;
 import static org.hamcrest.Matchers.is;

-@ESIntegTestCase.ClusterScope(scope = TEST, numDataNodes = 2)
+@ESIntegTestCase.ClusterScope(scope = TEST, numDataNodes = 0, autoManageMasterNodes = false)
 public class ClusterStateApplierOrderingTests extends BaseSearchableSnapshotsIntegTestCase {

     public void testRepositoriesServiceClusterStateApplierIsCalledBeforeIndicesClusterStateService() throws Exception {
+        internalCluster().setBootstrapMasterNodeIndex(0);
+        internalCluster().startNodes(2);
+
         final String fsRepoName = "fsrepo";
         final String indexName = "test-index";
         final String restoredIndexName = "restored-index";

@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 6, 2020

retest this please

2 similar comments
@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 6, 2020

retest this please

@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 6, 2020

retest this please

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcofdez fcofdez merged commit 0752a86 into elastic:7.x Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants