Zen2: Deterministic MasterService #32493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

ywelsch merged 18 commits into elastic:zen2 from ywelsch:testable-master-service

Aug 13, 2018

Contributor

ywelsch commented Jul 31, 2018

Increases testability of MasterService and the discovery layer. Changes:

Async publish method
Moved a few interfaces/classes top-level to simplify imports
Deterministic MasterService implementation for tests

ywelsch added 6 commits

July 23, 2018 17:21


          copy stuff from zendisco2 branch

29e6bc8


          Use MasterService to execute ClusterStateChanges

2efb267


          add tests

82b366c


          add future

2f8b153


          Move inner classes to top-level classes

80de420


          checkstyle

8cc6d9e

ywelsch added >non-issue v7.0.0 :Distributed Coordination/Cluster Coordination labels

ywelsch requested a review from DaveCTurner

July 31, 2018 11:39

Collaborator

elasticmachine commented Jul 31, 2018

Pinging @elastic/es-distributed

ywelsch mentioned this pull request

A new cluster coordination layer #32006

Closed

61 tasks

ywelsch added 2 commits

July 31, 2018 16:51


          fix test because it tried cluster state update when not being master

dfce0a8


          Merge remote-tracking branch 'elastic/zen2' into testable-master-service

4b41773

DaveCTurner reviewed

View reviewed changes

Contributor

DaveCTurner left a comment

I started to review this but did not get very far so only have superficial comments here. To be continued...

server/src/main/java/org/elasticsearch/discovery/ClusterStatePublisher.java Outdated

+               * specific language governing permissions and limitations
+               * under the License.
+               */
+              package org.elasticsearch.discovery;

Contributor

DaveCTurner Aug 3, 2018

Think I'd prefer this to be in org.elasticsearch.cluster.coordination.

Contributor Author

ywelsch Aug 13, 2018

fixed in 40d7c95

server/src/main/java/org/elasticsearch/discovery/ClusterStatePublisher.java Outdated

+                   * The {@link AckListener} allows to keep track of the ack received from nodes, and verify whether
+                   * they updated their own cluster state or not.
+                   *
+                   * The method is guaranteed to throw a {@link FailedToCommitClusterStateException} if the change is not committed and should be

Contributor

DaveCTurner Aug 3, 2018

I think "throw" here now means "pass to publishListener::onFailure".

Contributor Author

ywelsch Aug 13, 2018

fixed in 40d7c95

server/src/main/java/org/elasticsearch/discovery/ClusterStatePublisher.java Outdated

+                   * Publish all the changes to the cluster from the master (can be called just by the master). The publish
+                   * process should apply this state to the master as well!
+                   *
+                   * The publishListener allows to wait for the publication to go through.

Contributor

DaveCTurner Aug 3, 2018

"go through" meaning complete/fail/timeout?

Contributor Author

ywelsch Aug 13, 2018

fixed in 40d7c95

server/src/main/java/org/elasticsearch/discovery/ClusterStatePublisher.java Outdated

+                  interface AckListener {
+                      /**
+                       * Should be called when the discovery layer has committed the clusters state (i.e. even if this publication fails,

Contributor

DaveCTurner Aug 3, 2018

s/discovery/coordination/?

Contributor Author

ywelsch Aug 13, 2018

fixed in 40d7c95

server/src/main/java/org/elasticsearch/discovery/ClusterStatePublisher.java Outdated

+                      void onCommit(TimeValue commitTime);
+                      /**
+                       * Should be called whenever the discovery layer receives confirmation from a node that it has successfully applied

Contributor

DaveCTurner Aug 3, 2018

s/discovery/coordination/?

Contributor Author

ywelsch Aug 13, 2018

fixed in 40d7c95

DaveCTurner previously requested changes

View reviewed changes

Contributor

DaveCTurner left a comment

The change to ZenDiscovery doesn't look right. Also some other minor comments.

server/src/main/java/org/elasticsearch/ElasticsearchException.java Outdated

@@ @@ -1006,8 +1007,8 @@ public String toString() { @@
                           UNKNOWN_VERSION_ADDED),
                       TYPE_MISSING_EXCEPTION(org.elasticsearch.indices.TypeMissingException.class,
                               org.elasticsearch.indices.TypeMissingException::new, 137, UNKNOWN_VERSION_ADDED),
-                      FAILED_TO_COMMIT_CLUSTER_STATE_EXCEPTION(org.elasticsearch.discovery.Discovery.FailedToCommitClusterStateException.class,
-                              org.elasticsearch.discovery.Discovery.FailedToCommitClusterStateException::new, 140, UNKNOWN_VERSION_ADDED),
+                      FAILED_TO_COMMIT_CLUSTER_STATE_EXCEPTION(FailedToCommitClusterStateException.class,

Contributor

DaveCTurner Aug 6, 2018

Almost all of these registrations use the fully-qualified class name (except CoordinationStateRejectedException, oops) so it looks like this should too.

Contributor Author

ywelsch Aug 13, 2018

right, also fixed for CoordinationStateRejectedException in d3c5a3d

server/src/main/java/org/elasticsearch/discovery/FailedToCommitClusterStateException.java Outdated

+               * specific language governing permissions and limitations
+               * under the License.
+               */
+              package org.elasticsearch.discovery;

Contributor

DaveCTurner Aug 6, 2018

I think this should be in org.elasticsearch.cluster.coordination.

Contributor Author

ywelsch Aug 13, 2018

fixed in d3c5a3d

server/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java Outdated

		@@ -385,14 +387,6 @@ public void onNewClusterStateFailed(Exception e) {
		return;

Contributor

DaveCTurner Aug 6, 2018

This, and the containing synchronised block, don't look right. They throw FailedToCommitClusterStateException rather than passing it to the publishListener, and previously they returned early without blocking on the publication but the equivalent flow now would be to call publishListener.onResponse(null) early somehow.

Contributor Author

ywelsch Aug 13, 2018

fixed in 506608c

server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java Outdated

                       DestructiveOperations destructiveOperations = new DestructiveOperations(settings, clusterSettings);
                       Environment environment = TestEnvironment.newEnvironment(settings);
                       Transport transport = mock(Transport.class); // it's not used
+                      nextMasterTaskToRun = new AtomicReference<>();
+                      FakeThreadPoolMasterService masterService = new FakeThreadPoolMasterService("fake-master", nextMasterTaskToRun::set);

Contributor

DaveCTurner Aug 6, 2018

Does this need any kind of assertion that nextMasterTaskToRun isn't already set?


          odd stuff

506608c

ywelsch requested a review from bleskes

August 6, 2018 15:26

DaveCTurner dismissed their stale review

August 7, 2018 11:50

LGTM after 506608c, but needs another reviewer.

bleskes reviewed

View reviewed changes

Contributor

bleskes left a comment

I left some minor comments. The main change LGTM. The only concern I had, as discussed with @ywelsch , is that the integration of FakeThreadPoolMasterService with ClusterStateChanges is a bit clunky. It is my understanding that FakeThreadPoolMasterService is a very useful testing component for other parts of the work, but in that case I rather not use (as is) with ClusterStateChanges.

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java Outdated

+                  protected void publish(ClusterChangedEvent clusterChangedEvent, TaskOutputs taskOutputs, long startTimeNS) throws Exception {
+                      CompletableFuture<Void> fut = new CompletableFuture<>();
+                      clusterStatePublisher.publish(clusterChangedEvent, new ActionListener<Void>() {

Contributor

bleskes Aug 8, 2018

You can use ActionListener#wrap to make this slightly more compact. Also, I presume you consciously choose for a CompletableFuture over PlainActionFuture ?

Contributor Author

ywelsch Aug 13, 2018

The reason I did not choose PlainActionFuture was because it asserts that we're not blocking on the MasterServiceUpdateThread (which this future deliberately does). Unfortunately, CompletableFuture has other problems (#32512 (comment)), so I've gone back to PlainActionFuture, but added a hook that allows to disable checking some of the assertions, see 526511d

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java Outdated

+                          }
+                      }, taskOutputs.createAckListener(threadPool, clusterChangedEvent.state()));
+                      final ActionListener<Void> publishListener = getPublishListener(clusterChangedEvent, taskOutputs, startTimeNS);

Contributor

bleskes Aug 10, 2018

why do we need this extra listener construct? at the moment it's activated fully sequentially. It will be simpler to just process the results of the future inline?

Contributor Author

ywelsch Aug 13, 2018

The extra listener construct is not needed. I've changed this in c84ddf7

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java Outdated

                       return newClusterState;
                   }
+                  public Builder incrementVersion(ClusterState clusterState) {

Contributor

bleskes Aug 10, 2018

this can be protected

Contributor Author

ywelsch Aug 13, 2018

fixed in 526511d

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java

                   public void testClosedIndexOnReroute() throws InterruptedException {
                       final String index = "test";
                       // no replicas in oder to skip the replication part
-                      setState(clusterService, new ClusterStateChanges(xContentRegistry(), threadPool).closeIndices(state(index, true,
-                          ShardRoutingState.UNASSIGNED), new CloseIndexRequest(index)));
+                      ClusterStateChanges clusterStateChanges = new ClusterStateChanges(xContentRegistry(), threadPool);

Contributor

bleskes Aug 10, 2018

Wondering - why is this change needed?

Contributor Author

ywelsch Aug 13, 2018

The reason was that we made ClusterStateChanges more realistic now by introducing MasterService to it. This test was using a cluster state where the local node was not the master. As ClusterStateChanges now used the proper MasterService, it simply rejected the cluster state update to close the indices.

server/src/test/java/org/elasticsearch/discovery/zen/ZenDiscoveryUnitTests.java Outdated

+                      @Override
+                      public void onResponse(Void aVoid) {
+                          assertThat(countDownLatch.getCount(), is(1L));

Contributor

bleskes Aug 10, 2018

maybe synchronize this method?

Contributor Author

ywelsch Aug 13, 2018

fixed in b9d407f

ywelsch added 7 commits

August 13, 2018 11:03


          Merge remote-tracking branch 'elastic/zen2' into testable-master-service

05c9455


          First round feedback on ClusterStatePublisher

40d7c95


          Moved FailedToCommitClusterStateException

d3c5a3d


          use PlainActionFuture

526511d


          remove publishListener concept in MasterService

c84ddf7


          properly synchronize AwaitingPublishListener

b9d407f


          Revert ClusterStateChanges

e4bd482

Contributor Author

ywelsch commented Aug 13, 2018

It is my understanding that FakeThreadPoolMasterService is a very useful testing component for other parts of the work, but in that case I rather not use (as is) with ClusterStateChanges.

Ok, I've reverted this in e4bd482


          fix merge conflicts

6bc69f6

DaveCTurner reviewed

View reviewed changes

Contributor

DaveCTurner left a comment

I left a couple of nits but the extra changes still LGTM.

server/src/test/java/org/elasticsearch/ExceptionSerializationTests.java Outdated

@@ @@ -801,7 +802,7 @@ public void testIds() { @@
                       ids.put(137, org.elasticsearch.indices.TypeMissingException.class);
                       ids.put(138, null);
                       ids.put(139, null);
-                      ids.put(140, org.elasticsearch.discovery.Discovery.FailedToCommitClusterStateException.class);
+                      ids.put(140, FailedToCommitClusterStateException.class);

Contributor

DaveCTurner Aug 13, 2018

I think this should still be fully-qualified.

Contributor Author

ywelsch Aug 13, 2018

urgs... fixed in a0030c0

server/src/main/java/org/elasticsearch/cluster/service/MasterService.java Outdated

+                      final PlainActionFuture<Void> fut = new PlainActionFuture<Void>() {
+                          @Override
+                          protected boolean blockingAllowed() {
+                              // allow this one to block on the MasterServiceUpdateThread

Contributor

DaveCTurner Aug 13, 2018

This comment would be unnecessary if we wrote something like:

return Thread.currentThread().getName().contains(MASTER_UPDATE_THREAD_NAME) || super.blockingAllowed();

Contributor Author

ywelsch Aug 13, 2018

great idea. fixed in a0030c0


          feedback

a0030c0

ywelsch merged commit e122505 into elastic:zen2

colings86 added v7.0.0-beta1 and removed v7.0.0 labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Cluster Coordination >non-issue v7.0.0-beta1