Skip to content

Keep track of desired nodes cluster membership #84165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 3, 2022

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Feb 18, 2022

This commit adds tracking for desired nodes cluster membership.

When desired nodes are updated they are matched against the current
cluster members. Additionally when a node joins the cluster the
desired nodes cluster membership is updated.

@fcofdez fcofdez added >enhancement :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.2.0 labels Feb 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

@@ -140,8 +140,13 @@ static ClusterState updateDesiredNodes(ClusterState currentState, UpdateDesiredN
}
}

DesiredNodesMetadata.Builder desiredNodesMetadataBuilder = new DesiredNodesMetadata.Builder(proposedDesiredNodes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should take into account previous desired nodes marked as members. I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackled in 42d504a

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not get through it all, but wanted to relay a few early comments here.

history_id: "test"
version: 2
nodes:
- { settings: { node: { name: "instance-000187" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new contains assertion no longer catches if there are too many desired nodes, perhaps we can add a check that there are only 2?

It looks like we are trying to preserve order anyway, so perhaps we could even keep matching against the list?

final String historyId = in.readString();
final long version = in.readLong();
if (in.getVersion().onOrAfter(Version.CURRENT)) {
return new DesiredNodes(historyId, version, in.readMap(StreamInput::readString, DesiredNode::new));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve order here too? Or perhaps even switch to an ordered map (TreeeMap) to ensure a consistent ordering?

I wonder if we even need to change the serialization here? I think we are sending the externalId twice now and might as well send it as a list and then reconstruct it on the receiver side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that too. I would expect that the number of desired nodes would be fairly low in most cases. Let me revert this change.

- contains: { nodes: { settings: { node: { external_id: "instance-000187" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version } }
- contains: { nodes: { settings: { node: { external_id: "instance-000188" } }, processors: 8, memory: "64gb", storage: "128gb", node_version: $es_version } }
---
"Test update desired nodes is idempotent with different order":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , but would be nice to even split this out into a separate PR? Or is the change too coupled with the rest of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me split this PR 👍

Copy link
Contributor Author

@fcofdez fcofdez Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new PR #84227 with these changes and removed them from this PR. I think is better to hold this PR review until we get that one merged.

@fcofdez fcofdez force-pushed the desired-nodes-membership branch from 1446da3 to 66532d8 Compare February 22, 2022 15:13
@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

@fcofdez fcofdez marked this pull request as draft February 24, 2022 15:59
@fcofdez
Copy link
Contributor Author

fcofdez commented Feb 24, 2022

I'm moving this to draft since I found a bug while I was working in a related PR

This commit adds tracking for desired nodes cluster membership.

When desired nodes are updated they are matched against the current
cluster members. Additionally when a node joins the cluster the
desired nodes cluster membership is updated.
@fcofdez fcofdez force-pushed the desired-nodes-membership branch from d9d76b8 to 35d68ef Compare February 25, 2022 15:17
@fcofdez fcofdez marked this pull request as ready for review February 25, 2022 17:36
@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here. I wonder if we should make cluster membership a dynamically calculated property instead rather than persist it into metadata? For instance the current PR will IIUC keep all nodes as members if all nodes are abruptly shutdown and only some are started? Also at a higher level, I think knowledge about the current nodes in the cluster should not go into metadata to avoid persisting such information.

I can see a potential need for whether we ever saw a desired node, which would need to be in metadata. But I think the purpose here is different?

Left a couple random comments, not too important at this time.

}

private final DesiredNodes latestDesiredNodes;
private final Set<DesiredNode> clusterMembers;
private final Set<DesiredNode> notClusterMembers;

public DesiredNodesMetadata(DesiredNodes latestDesiredNodes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this private now that we have the Builder?

private DesiredNodes desiredNodes;
private final Set<DesiredNode> clusterMembers;

public Builder(DesiredNodesMetadata desiredNodesMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We often add a builder() method to the outer class. I like that, since it make the client code slightly shorter and reads slightly better I think.

@fcofdez fcofdez requested a review from henningandersen April 25, 2022 10:11
@fcofdez
Copy link
Contributor Author

fcofdez commented Apr 25, 2022

@henningandersen I've followed a different approach where the membership is tracked in-memory, let me know if you prefer this approach. Thanks!

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this directions looks fine. I am not sure we need the quarantining/member removal though, I think we should simply remove that.

@@ -34,7 +36,7 @@
import static java.lang.String.format;
import static org.elasticsearch.node.Node.NODE_EXTERNAL_ID_SETTING;

public class DesiredNodes implements Writeable, ToXContentObject {
public class DesiredNodes extends AbstractCollection<DesiredNode> implements Writeable, ToXContentObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the uses here can work with just Iterable and I prefer that to not add a bunch of methods to the DesiredNodes interface that does not work.

Comment on lines 67 to 70
for (DiscoveryNode removedNode : nodesDelta.removedNodes()) {
final var desiredNode = desiredNodes.find(removedNode.getExternalId());
moveToQuarantineIfMember(desiredNode);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is necessary. If we ever saw a desired node and it disappears, we expect it to reappear or a new desired nodes state to be set. Anything else is a bug in the orchestration?

From the data tier allocation decider, we start allocating shards to the cold node(s) once we see the first one. But in case a node then unexpectedly drops out, I think we would not want to start allocating them to less preferred tiers. That could potentially cause issues on higher tiers more than help. And it would sort of hide that there is an issue, since if you look at health and allocation, everything looks good.

@@ -37,6 +38,7 @@
public class DiscoveryNode implements Writeable, ToXContentFragment {

static final String COORDINATING_ONLY = "coordinating_only";
public static final Version EXTERNAL_ID_VERSION = Version.CURRENT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd normally put in the specific version here during development to not have to remember to alter it before merging it.

@@ -205,6 +207,7 @@ public void testDiscoveryNodeToXContent() {
assertThat(topLevelMap.toString(), detailsMap.remove("name"), equalTo("test-name"));
assertThat(topLevelMap.toString(), detailsMap.remove("ephemeral_id"), equalTo("test-ephemeral-id"));
assertThat(topLevelMap.toString(), detailsMap.remove("transport_address"), equalTo(transportAddress.toString()));
assertThat(topLevelMap.toString(), detailsMap.remove("external_id"), withExternalId ? equalTo("test-external-id") : equalTo(""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it not be test-name when not supplying external id?

private DesiredNode newDesiredNode(String nodeName) {
return new DesiredNode(
Settings.builder().put(Node.NODE_NAME_SETTING.getKey(), nodeName).build(),
16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us randomize this too?

import java.util.HashSet;
import java.util.Set;

public class DesiredNodesMembershipTracker implements ClusterStateListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a service in order for various components to be able to ask it for the members? We can also turn it into a service when we add the first use of this.

@fcofdez fcofdez requested a review from henningandersen April 29, 2022 14:30
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things around master failover and removing nodes.

Comment on lines 52 to 57
for (DiscoveryNode removedNode : nodesDelta.removedNodes()) {
final var desiredNode = desiredNodes.find(removedNode.getExternalId());
if (desiredNode != null) {
members.remove(desiredNode);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this. If we ever saw the node it is a member until a desired nodes update is made that removes it.

}
}

members.removeAll(unknownDesiredNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us also remove this.


@Override
public synchronized void clusterChanged(ClusterChangedEvent event) {
if (event.localNodeMaster()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just maintain this always, regardless of who is master? That would ensure it is readily available on failover. Otherwise, I worry that we need another special case for the failover case, i.e., when node was not previously master, since we only consider the delta of nodes below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should be fine if we keep track of this in all nodes and as you mention it should simplify things

}
}

public void testMasterDemotionClearsMembers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a test demonstrating that a promoted master has the right set of members?

@fcofdez
Copy link
Contributor Author

fcofdez commented May 2, 2022

@elasticmachine run elasticsearch-ci/part-1
(It was a Gradle connectivity issue)

@fcofdez fcofdez requested a review from henningandersen May 2, 2022 14:54
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 38 to 39
final var previousDesiredNodes = DesiredNodes.latestFromClusterState(event.previousState());
if (previousDesiredNodes != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this condition? Seems more expensive than just clearing members always?

}

// visible for testing
synchronized int trackedMembers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
synchronized int trackedMembers() {
synchronized int trackedMembersCount() {

@fcofdez
Copy link
Contributor Author

fcofdez commented May 3, 2022

@elasticmachine run elasticsearch-ci/part-2
(Unrelated LDAP failure)

@fcofdez fcofdez merged commit ce9819f into elastic:master May 3, 2022
@fcofdez
Copy link
Contributor Author

fcofdez commented May 3, 2022

Thanks Henning!

fcofdez added a commit that referenced this pull request Jun 16, 2022
This commit adds desired nodes status tracking to the cluster state. Previously status was tracked
in-memory by DesiredNodesMembershipService this approach had certain limitations, and made
the consumer code more complex. This takes a simpler approach to keep the status updated when
the desired nodes are updated or when a new node joins, storing the status in the cluster state,
this allows to consume that information easily where it is necessary.
Additionally, this commit moves test code from depending directly of DesiredNodes which can be
seen as an internal data structure to rely more on UpdateDesiredNodesRequest.

Relates #84165
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants