-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Remove initial_master_nodes on node restart #37580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove initial_master_nodes on node restart #37580
Conversation
Pinging @elastic/es-distributed |
@elasticmachine retest this please |
@elasticmachine run the gradle build tests 1 |
@@ -935,7 +935,10 @@ Settings closeForRestart(RestartCallback callback, int minMasterNodes) throws Ex | |||
} | |||
if (minMasterNodes >= 0) { | |||
assert DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.exists(newSettings.build()) == false : "min master nodes is auto managed"; | |||
newSettings.put(DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minMasterNodes).build(); | |||
newSettings.put(DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), minMasterNodes); | |||
if (callbackSettings != null && INITIAL_MASTER_NODES_SETTING.exists(callbackSettings) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning null
from onNodeStopped
is a bug since it looks equivalent to returning Settings.EMPTY
. The only place I can see this happening is QuorumGatewayIT
in which the onNodeStopped
override can be completely removed. Can we just do that and then assert it's not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jup
@elasticmachine run the gradle build tests 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine run the gradle build tests 1 |
Some tests (DedicatedClusterSnapshotRestoreIT.testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped.
The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.
See e.g. https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/278/console