-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Keep track of desired nodes cluster membership #84165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @fcofdez, I've created a changelog YAML for you. |
@@ -140,8 +140,13 @@ static ClusterState updateDesiredNodes(ClusterState currentState, UpdateDesiredN | |||
} | |||
} | |||
|
|||
DesiredNodesMetadata.Builder desiredNodesMetadataBuilder = new DesiredNodesMetadata.Builder(proposedDesiredNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should take into account previous desired nodes marked as members. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tackled in 42d504a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not get through it all, but wanted to relay a few early comments here.
history_id: "test" | ||
version: 2 | ||
nodes: | ||
- { settings: { node: { name: "instance-000187" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new contains
assertion no longer catches if there are too many desired nodes, perhaps we can add a check that there are only 2?
It looks like we are trying to preserve order anyway, so perhaps we could even keep matching against the list?
final String historyId = in.readString(); | ||
final long version = in.readLong(); | ||
if (in.getVersion().onOrAfter(Version.CURRENT)) { | ||
return new DesiredNodes(historyId, version, in.readMap(StreamInput::readString, DesiredNode::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should preserve order here too? Or perhaps even switch to an ordered map (TreeeMap
) to ensure a consistent ordering?
I wonder if we even need to change the serialization here? I think we are sending the externalId
twice now and might as well send it as a list and then reconstruct it on the receiver side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that too. I would expect that the number of desired nodes would be fairly low in most cases. Let me revert this change.
- contains: { nodes: { settings: { node: { external_id: "instance-000187" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version } } | ||
- contains: { nodes: { settings: { node: { external_id: "instance-000188" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version } } | ||
--- | ||
"Test update desired nodes is idempotent with different order": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 , but would be nice to even split this out into a separate PR? Or is the change too coupled with the rest of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me split this PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a new PR #84227 with these changes and removed them from this PR. I think is better to hold this PR review until we get that one merged.
1446da3
to
66532d8
Compare
Hi @fcofdez, I've created a changelog YAML for you. |
|
This commit adds tracking for desired nodes cluster membership. When desired nodes are updated they are matched against the current cluster members. Additionally when a node joins the cluster the desired nodes cluster membership is updated.
d9d76b8
to
35d68ef
Compare
Hi @fcofdez, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here. I wonder if we should make cluster membership a dynamically calculated property instead rather than persist it into metadata? For instance the current PR will IIUC keep all nodes as members if all nodes are abruptly shutdown and only some are started? Also at a higher level, I think knowledge about the current nodes in the cluster should not go into metadata to avoid persisting such information.
I can see a potential need for whether we ever saw a desired node, which would need to be in metadata. But I think the purpose here is different?
Left a couple random comments, not too important at this time.
} | ||
|
||
private final DesiredNodes latestDesiredNodes; | ||
private final Set<DesiredNode> clusterMembers; | ||
private final Set<DesiredNode> notClusterMembers; | ||
|
||
public DesiredNodesMetadata(DesiredNodes latestDesiredNodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this private now that we have the Builder
?
private DesiredNodes desiredNodes; | ||
private final Set<DesiredNode> clusterMembers; | ||
|
||
public Builder(DesiredNodesMetadata desiredNodesMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We often add a builder()
method to the outer class. I like that, since it make the client code slightly shorter and reads slightly better I think.
@henningandersen I've followed a different approach where the membership is tracked in-memory, let me know if you prefer this approach. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this directions looks fine. I am not sure we need the quarantining/member removal though, I think we should simply remove that.
@@ -34,7 +36,7 @@ | |||
import static java.lang.String.format; | |||
import static org.elasticsearch.node.Node.NODE_EXTERNAL_ID_SETTING; | |||
|
|||
public class DesiredNodes implements Writeable, ToXContentObject { | |||
public class DesiredNodes extends AbstractCollection<DesiredNode> implements Writeable, ToXContentObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of the uses here can work with just Iterable
and I prefer that to not add a bunch of methods to the DesiredNodes
interface that does not work.
for (DiscoveryNode removedNode : nodesDelta.removedNodes()) { | ||
final var desiredNode = desiredNodes.find(removedNode.getExternalId()); | ||
moveToQuarantineIfMember(desiredNode); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is necessary. If we ever saw a desired node and it disappears, we expect it to reappear or a new desired nodes state to be set. Anything else is a bug in the orchestration?
From the data tier allocation decider, we start allocating shards to the cold node(s) once we see the first one. But in case a node then unexpectedly drops out, I think we would not want to start allocating them to less preferred tiers. That could potentially cause issues on higher tiers more than help. And it would sort of hide that there is an issue, since if you look at health and allocation, everything looks good.
@@ -37,6 +38,7 @@ | |||
public class DiscoveryNode implements Writeable, ToXContentFragment { | |||
|
|||
static final String COORDINATING_ONLY = "coordinating_only"; | |||
public static final Version EXTERNAL_ID_VERSION = Version.CURRENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd normally put in the specific version here during development to not have to remember to alter it before merging it.
@@ -205,6 +207,7 @@ public void testDiscoveryNodeToXContent() { | |||
assertThat(topLevelMap.toString(), detailsMap.remove("name"), equalTo("test-name")); | |||
assertThat(topLevelMap.toString(), detailsMap.remove("ephemeral_id"), equalTo("test-ephemeral-id")); | |||
assertThat(topLevelMap.toString(), detailsMap.remove("transport_address"), equalTo(transportAddress.toString())); | |||
assertThat(topLevelMap.toString(), detailsMap.remove("external_id"), withExternalId ? equalTo("test-external-id") : equalTo("")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it not be test-name
when not supplying external id?
private DesiredNode newDesiredNode(String nodeName) { | ||
return new DesiredNode( | ||
Settings.builder().put(Node.NODE_NAME_SETTING.getKey(), nodeName).build(), | ||
16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us randomize this too?
import java.util.HashSet; | ||
import java.util.Set; | ||
|
||
public class DesiredNodesMembershipTracker implements ClusterStateListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be a service in order for various components to be able to ask it for the members? We can also turn it into a service when we add the first use of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more things around master failover and removing nodes.
for (DiscoveryNode removedNode : nodesDelta.removedNodes()) { | ||
final var desiredNode = desiredNodes.find(removedNode.getExternalId()); | ||
if (desiredNode != null) { | ||
members.remove(desiredNode); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this. If we ever saw the node it is a member until a desired nodes update is made that removes it.
} | ||
} | ||
|
||
members.removeAll(unknownDesiredNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us also remove this.
|
||
@Override | ||
public synchronized void clusterChanged(ClusterChangedEvent event) { | ||
if (event.localNodeMaster()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just maintain this always, regardless of who is master? That would ensure it is readily available on failover. Otherwise, I worry that we need another special case for the failover case, i.e., when node was not previously master, since we only consider the delta of nodes below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be fine if we keep track of this in all nodes and as you mention it should simplify things
} | ||
} | ||
|
||
public void testMasterDemotionClearsMembers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a test demonstrating that a promoted master has the right set of members?
@elasticmachine run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
final var previousDesiredNodes = DesiredNodes.latestFromClusterState(event.previousState()); | ||
if (previousDesiredNodes != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this condition? Seems more expensive than just clearing members
always?
} | ||
|
||
// visible for testing | ||
synchronized int trackedMembers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronized int trackedMembers() { | |
synchronized int trackedMembersCount() { |
@elasticmachine run elasticsearch-ci/part-2 |
Thanks Henning! |
This commit adds desired nodes status tracking to the cluster state. Previously status was tracked in-memory by DesiredNodesMembershipService this approach had certain limitations, and made the consumer code more complex. This takes a simpler approach to keep the status updated when the desired nodes are updated or when a new node joins, storing the status in the cluster state, this allows to consume that information easily where it is necessary. Additionally, this commit moves test code from depending directly of DesiredNodes which can be seen as an internal data structure to rely more on UpdateDesiredNodesRequest. Relates #84165
This commit adds tracking for desired nodes cluster membership.
When desired nodes are updated they are matched against the current
cluster members. Additionally when a node joins the cluster the
desired nodes cluster membership is updated.