Keep track of desired nodes status in cluster state #87474

fcofdez · 2022-06-07T18:10:45Z

This commit adds desired nodes status tracking to the cluster state. Previously status was tracked
in-memory by DesiredNodesMembershipService this approach had certain limitations, and made
the consumer code more complex. This takes a simpler approach to keep the status updated when
the desired nodes are updated or when a new node joins, storing the status in the cluster state,
this allows to consume that information easily where it is necessary.
Additionally, this commit moves test code from depending directly of DesiredNodes which can be
seen as an internal data structure to rely more on UpdateDesiredNodesRequest.

Relates #84165

fcofdez · 2022-06-08T09:15:30Z

I opened fcofdez#1 with the DataTierAllocationDecider changes on top of this PR to see what the changes look like and get two separate PRs.

fcofdez · 2022-06-08T09:58:13Z

I'm checking why the upgrade test is failing, but I think the PR is ready for review.

elasticmachine · 2022-06-08T09:59:13Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-06-08T09:59:43Z

Hi @fcofdez, I've created a changelog YAML for you.

elasticmachine · 2022-06-08T10:05:39Z

Pinging @elastic/clients-team (Team:Clients)

…ez/elasticsearch into desired-nodes-status-cluster-state

henningandersen

I read through most of the production code, looks like a good direction.

henningandersen · 2022-06-08T18:02:04Z

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodes.java

+        if (desiredNodes == null) {
+            return clusterState;
+        }


This null check seems redundant?

henningandersen · 2022-06-08T18:48:12Z

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodes.java

+            return clusterState;
+        }
+
+        final Map<String, DesiredNodeWithStatus> updatedStateDesiredNodes = new HashMap<>(desiredNodes.nodes);


Can we make this map null initially, initialize it during the loop and use the non-nullness instead of the statusModified flag? Doing so avoids some work for all the cases where there is no change.

henningandersen · 2022-06-08T18:53:12Z

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodeWithStatus.java

+        if (in.getVersion().onOrAfter(STATUS_TRACKING_SUPPORT_VERSION)) {
+            status = Status.fromValue(in.readShort());
+        } else {
+            status = Status.defaultStatus();


I wonder if PENDING is the right status here? I would rather assume that information from previous versions were actualized, since a forever pending node is unexpected?

Yes, that makes sense, and I think we should get a new desired nodes version in this scenario since the cluster is upgraded at that point.

One downside of using ACTUALIZED here is that we might end up taking a decision based on made-up information, i.e. a node in certain tier was supposed to be join to the cluster but never did, we might decide to move some shards to that inexistent tier? Maybe I'm overthinking this...

Yes, that is true. I think we could perhaps fix this by also updating actualized state in JoinTaskExecutor.becomeMasterAndTrimConflictingNodes?

We already cover that as when we call becomeMasterAndTrimConflictingNodes we set nodesChanged and we update the desired nodes status when some of the nodes have changed.

henningandersen · 2022-06-08T18:56:28Z

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodeWithStatus.java

+                (ByteSizeValue) args[4],
+                (Version) args[5]
+            ),
+            args[6] == null ? Status.defaultStatus() : (Status) args[6]


I'd find it easier to read to put Status.PENDING here, I am not sure the defaultStatus method adds value? At least I had to go and look up what constant it returns.

I also wonder if this case should use actualized instead. I suppose a master reading this from xcontent would see all nodes join and thus this might be ok here, do you have input to this?

My reasoning here was that we would end up updating to the proper status as the cluster is upgraded and nodes re-join the cluster (we would reconcile the status at that point), so if there's a missing node we'll still see it as PENDING

On a rolling upgrade, I would think that an upgraded master keeps the list of nodes in cluster state and thus the way we update desired nodes inside the if (nodesChanged) in JoinTaskExecutor will (I think) not work to fix this. The new master will be joined by a number of existing nodes.

We should probably write a test case to demonstrate that this works. A rolling upgrade style test with multiple nodes should not be too bad to write perhaps - we just need to set desired nodes on old version, then rolling upgrade, then check that they are all actualized on new version? Happy to go some other route too.

Sorry, above is mostly relevant for the stream-reading case. Clearly, if we read from xcontent, we would be joined by all nodes, so this case does sound good. Still would be very nice to test it.

We should probably write a test case to demonstrate that this works. A rolling upgrade style test with multiple nodes should not be too bad to write perhaps - we just need to set desired nodes on old version, then rolling upgrade, then check that they are all actualized on new version? Happy to go some other route too.

DesiredNodesUpgradeIT tests that scenario, I've just added the assertion to ensure that all nodes are ACTUALIZED when the cluster is upgraded.

henningandersen · 2022-06-08T19:07:09Z

rest-api-spec/src/main/resources/rest-api-spec/api/_internal.get_desired_nodes.json

            "GET"
          ]
        }
      ]
+    },
+    "params": {
+      "include_status": {


I wonder if we need this flag? I think we discussed it but do not precisely recall a conclusion.

It sort of increases our BWC surface. And I am in doubt if anyone will ever want to not see the status. An orchestration system should build up their next desired nodes independently of GET _internal/desired_nodes

I agree, it doesn't add much value. I added the flag mostly to ensure that upgrades work as expected, so not a very good reason to increase our BWC surface.

henningandersen · 2022-06-09T15:40:12Z

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

+
+                            @Override
+                            public void onFailure(Exception e) {
+                                if (MasterService.isPublishFailureException(e)) {


Is this essential? I am not sure I follow, i.e., if reroute fails for other reasons than publishing, it seems like a bug in our code?

It could simplify all of this a lot if we just succeeded all task-contexts and overrode clusterStatePublished on the executor to do the reroute, what would go wrong if we did this?

It could simplify all of this a lot if we just succeeded all task-contexts and overrode clusterStatePublished on the executor to do the reroute, what would go wrong if we did this?

I agree, this is a bit complex but it has the nice property of notifying the update desired nodes listeners after the reroute finishes. It's true that this is not super important for orchestrators though, it mostly simplifies our testing code. I'll take your approach.

…s-cluster-state

henningandersen

I went through most of the code and wanted to provide my comments now. This direction looks good though I fear that there is an edge case we are not catching, see comments.

rest-api-spec/src/main/resources/rest-api-spec/api/_internal.get_desired_nodes.json

server/src/internalClusterTest/java/org/elasticsearch/cluster/DesiredNodesStatusIT.java

server/src/test/java/org/elasticsearch/cluster/metadata/DesiredNodesTests.java

server/src/test/java/org/elasticsearch/cluster/metadata/DesiredNodesTestCase.java

server/src/test/java/org/elasticsearch/cluster/metadata/DesiredNodesTests.java

server/src/main/java/org/elasticsearch/cluster/coordination/JoinTaskExecutor.java

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodeWithStatus.java

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodes.java

…s-cluster-state

henningandersen

Left a few final comments otherwise this looks good to me.

server/src/test/java/org/elasticsearch/cluster/metadata/DesiredNodesTestCase.java

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodes.java

...a/org/elasticsearch/action/admin/cluster/desirednodes/TransportUpdateDesiredNodesAction.java

qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/DesiredNodesUpgradeIT.java

server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodeWithStatus.java

server/src/test/java/org/elasticsearch/cluster/coordination/JoinTaskExecutorTests.java

server/src/test/java/org/elasticsearch/cluster/metadata/DesiredNodesTests.java

…s-cluster-state

henningandersen

LGTM, thanks for all your efforts on this.

…s-cluster-state

fcofdez · 2022-06-16T09:08:19Z

Thanks for the review Henning!

elasticsearchmachine added the v8.4.0 label Jun 7, 2022

Keep track of desired nodes membership in cluster state

319e9d6

fcofdez force-pushed the desired-nodes-status-cluster-state branch from e002c53 to 319e9d6 Compare June 8, 2022 08:57

fcofdez changed the title ~~Keep track of desired nodes membership in cluster state~~ Keep track of desired nodes status in cluster state Jun 8, 2022

fcofdez marked this pull request as ready for review June 8, 2022 09:56

fcofdez requested a review from henningandersen June 8, 2022 09:57

fcofdez added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 8, 2022

fcofdez added :Distributed Coordination/Autoscaling >enhancement labels Jun 8, 2022

Update docs/changelog/87474.yaml

2dfdd7b

sethmlarson added the Team:Clients Meta label for clients team label Jun 8, 2022

fcofdez added 2 commits June 8, 2022 14:49

Fix tests

289e7a8

Merge branch 'desired-nodes-status-cluster-state' of github.com:fcofd…

bd5955d

…ez/elasticsearch into desired-nodes-status-cluster-state

henningandersen reviewed Jun 9, 2022

View reviewed changes

fcofdez added 9 commits June 10, 2022 15:47

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

2028247

…s-cluster-state

Review comments

fc09e88

Review comments

44809f8

Remove irrelevant test

75c4a3a

Revert API spec changes

2b2c5b9

Check that desired nodes are ACTUALIZED during rolling upgrade

325fc57

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

eaed90b

…s-cluster-state

Minor cleanup

ed6ed0e

Fix serialization tests

57aa38a

fcofdez requested a review from henningandersen June 13, 2022 11:38

henningandersen reviewed Jun 14, 2022

View reviewed changes

fcofdez added 5 commits June 14, 2022 10:37

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

07d87d0

…s-cluster-state

Review comments

9651b02

Change integ test cluster scope

4ee0f38

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

90e1773

…s-cluster-state

Add short summary about default status

d7a87db

fcofdez requested a review from henningandersen June 15, 2022 07:36

henningandersen reviewed Jun 15, 2022

View reviewed changes

fcofdez added 4 commits June 15, 2022 18:57

Review comments

0977b05

Move to Set

c90dd32

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

daa180b

…s-cluster-state

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

7ee94d3

…s-cluster-state

fcofdez requested a review from henningandersen June 15, 2022 18:39

henningandersen approved these changes Jun 15, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into desired-nodes-statu…

1ccfa06

…s-cluster-state

fcofdez merged commit eb8c4ba into elastic:master Jun 16, 2022

Keep track of desired nodes status in cluster state #87474

Keep track of desired nodes status in cluster state #87474

Uh oh!

Conversation

fcofdez commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcofdez commented Jun 8, 2022

Uh oh!

fcofdez commented Jun 8, 2022

Uh oh!

elasticmachine commented Jun 8, 2022

Uh oh!

elasticsearchmachine commented Jun 8, 2022

Uh oh!

elasticmachine commented Jun 8, 2022

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

fcofdez commented Jun 7, 2022 •

edited

Loading