Skip to content

ClusterStateTaskListener usage refactoring in MasterServiceTests #82869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 24, 2022

Conversation

idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Jan 20, 2022

Today node removal tasks executed by the master have a separate
ClusterStateTaskListener to feed back the result to the requester.
It'd be preferable to use the task itself as the listener.

Rel: #82644

Today node removal tasks executed by the master have a separate
ClusterStateTaskListener to feed back the result to the requester.
It'd be preferable to use the task itself as the listener.
@idegtiarenko idegtiarenko added >refactoring :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.1.0 labels Jan 20, 2022
@idegtiarenko idegtiarenko marked this pull request as ready for review January 20, 2022 17:17
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some comments.

int numberOfExecutors = Math.max(1, numberOfThreads / 4);
final int numberOfThreads = randomIntBetween(2, 8);
final int taskSubmissionsPerThread = randomIntBetween(1, 64);
final int numberOfExecutors = Math.max(1, numberOfThreads / 4);
final Semaphore semaphore = new Semaphore(numberOfExecutors);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This semaphore seems unnecessary to me, maybe I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ClusterStateTaskListener#clusterStateProcessed is called after the ClusterStateTaskExecutor#clusterStatePublished then we can guarantee all tasks are completed just with processedStatesLatch. Let me confirm this assumption

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was missing something: I didn't see the final call to Semaphore#acquire 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the order of execution is:

  • ClusterStateTaskExecutor#execute
  • N * ClusterStateTaskListener#clusterStateProcessed
  • ClusterStateTaskExecutor#clusterStatePublished(ClusterStatePublicationEvent clusterPublicationEvent)

This guards us from failing following assertion: assertEquals(executor.batches.get(), executor.published.get());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in another channel, there's only one thread here so this can be a new Semaphore(1) instead, and the semaphore.acquire() call that happens when the task is executed should be assertTrue(semaphore.tryAcquire()) instead.

@@ -464,23 +464,39 @@ public void onFailure(Exception e) {
}

public void testClusterStateBatchedUpdates() throws BrokenBarrierException, InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important thing that this test seems not to check is that batches submitted with the same executor are executed together. It's a bit tricky to test because you have to block the master service while you're submitting the batches to make sure that they all arrive "at once". Still, I think this is something worth addressing either randomly in this test or else in a separate test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection I think it doesn't make sense to try and extend this test to cover this extra invariant too. Let's leave it as it is and add a new test in a follow-up.

Copy link
Contributor Author

@idegtiarenko idegtiarenko Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is asserted here: https://github.com/idegtiarenko/elasticsearch/blob/665c374b90098f4dd34bcf238de21342a2b4c47a/server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java#L538-L545 (unless I miss something)

UPD: it is not checking "all at once" but at least it is verifying the grouping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asserts that every set of tasks that was submitted together is executed as an atomic unit. That's certainly important but in practice we only submit multiple tasks in CandidateJoinAccumulator when completing an election. We also want to know that tasks submitted with multiple calls to submitStateUpdateTask are executed as a batch.


TaskExecutor(List<Set<Task>> taskGroups) {
this.taskGroups = taskGroups;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to assignments. Now executor will verify it only executes own tasks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't actually verify this yet, we're now only checking that every group of our own tasks is completely executed. Previously we were checking that every group of any executor's tasks is completely executed so technically this is now a weaker test.

I think this would be a good property to verify, but to do this we also need to say that every task we're executing belongs to one of our own groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added assertThat("All tasks should belong to this executor", totalCount, equalTo(tasks.size())); to verify that every single task from the list belongs to the current executor.

There is also the a check in the very end of the test that assertEquals(executor.assigned.get(), executor.executed.get()) that verifies that all assigned tasks are executed. We also have a check that no task is executed twice.

I believe this should cover everything, is it?

@idegtiarenko
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/part-1

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few cosmetic suggestions and one request.


TaskExecutor(List<Set<Task>> taskGroups) {
this.taskGroups = taskGroups;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't actually verify this yet, we're now only checking that every group of our own tasks is completely executed. Previously we were checking that every group of any executor's tasks is completely executed so technically this is now a weaker test.

I think this would be a good property to verify, but to do this we also need to say that every task we're executing belongs to one of our own groups.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
assertThat("All tasks should belong to this executor", totalCount, equalTo(tasks.size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat :)

@idegtiarenko idegtiarenko merged commit 805cd39 into elastic:master Jan 24, 2022
@idegtiarenko idegtiarenko deleted the 82644_master_service_tests branch January 24, 2022 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >refactoring Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants