Skip to content

Remove initial_master_nodes on node restart #37580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jan 17, 2019

Some tests (DedicatedClusterSnapshotRestoreIT.testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped.

The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.

See e.g. https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/278/console

@ywelsch ywelsch added >non-issue >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 17, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 17, 2019

@elasticmachine retest this please

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 18, 2019

@elasticmachine run the gradle build tests 1

@@ -935,7 +935,10 @@ Settings closeForRestart(RestartCallback callback, int minMasterNodes) throws Ex
}
if (minMasterNodes >= 0) {
assert DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.exists(newSettings.build()) == false : "min master nodes is auto managed";
newSettings.put(DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minMasterNodes).build();
newSettings.put(DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minMasterNodes);
if (callbackSettings != null && INITIAL_MASTER_NODES_SETTING.exists(callbackSettings) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning null from onNodeStopped is a bug since it looks equivalent to returning Settings.EMPTY. The only place I can see this happening is QuorumGatewayIT in which the onNodeStopped override can be completely removed. Can we just do that and then assert it's not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jup

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 18, 2019

@elasticmachine run the gradle build tests 1

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 18, 2019

@elasticmachine run the gradle build tests 1
@elasticmachine run the gradle build tests 2

@ywelsch ywelsch merged commit 377d96e into elastic:master Jan 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue >test Issues or PRs that are addressing/adding tests v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants