-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ClusterStateTaskListener usage refactoring in MasterServiceTests #82869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterStateTaskListener usage refactoring in MasterServiceTests #82869
Conversation
Today node removal tasks executed by the master have a separate ClusterStateTaskListener to feed back the result to the requester. It'd be preferable to use the task itself as the listener.
server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java
Show resolved
Hide resolved
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left some comments.
int numberOfExecutors = Math.max(1, numberOfThreads / 4); | ||
final int numberOfThreads = randomIntBetween(2, 8); | ||
final int taskSubmissionsPerThread = randomIntBetween(1, 64); | ||
final int numberOfExecutors = Math.max(1, numberOfThreads / 4); | ||
final Semaphore semaphore = new Semaphore(numberOfExecutors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This semaphore seems unnecessary to me, maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ClusterStateTaskListener#clusterStateProcessed
is called after the ClusterStateTaskExecutor#clusterStatePublished
then we can guarantee all tasks are completed just with processedStatesLatch
. Let me confirm this assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was missing something: I didn't see the final call to Semaphore#acquire
🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the order of execution is:
ClusterStateTaskExecutor#execute
- N *
ClusterStateTaskListener#clusterStateProcessed
ClusterStateTaskExecutor#clusterStatePublished(ClusterStatePublicationEvent clusterPublicationEvent)
This guards us from failing following assertion: assertEquals(executor.batches.get(), executor.published.get());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in another channel, there's only one thread here so this can be a new Semaphore(1)
instead, and the semaphore.acquire()
call that happens when the task is executed should be assertTrue(semaphore.tryAcquire())
instead.
server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java
Show resolved
Hide resolved
@@ -464,23 +464,39 @@ public void onFailure(Exception e) { | |||
} | |||
|
|||
public void testClusterStateBatchedUpdates() throws BrokenBarrierException, InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One important thing that this test seems not to check is that batches submitted with the same executor are executed together. It's a bit tricky to test because you have to block the master service while you're submitting the batches to make sure that they all arrive "at once". Still, I think this is something worth addressing either randomly in this test or else in a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection I think it doesn't make sense to try and extend this test to cover this extra invariant too. Let's leave it as it is and add a new test in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is asserted here: https://github.com/idegtiarenko/elasticsearch/blob/665c374b90098f4dd34bcf238de21342a2b4c47a/server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java#L538-L545 (unless I miss something)
UPD: it is not checking "all at once" but at least it is verifying the grouping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This asserts that every set of tasks that was submitted together is executed as an atomic unit. That's certainly important but in practice we only submit multiple tasks in CandidateJoinAccumulator
when completing an election. We also want to know that tasks submitted with multiple calls to submitStateUpdateTask
are executed as a batch.
|
||
TaskExecutor(List<Set<Task>> taskGroups) { | ||
this.taskGroups = taskGroups; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to assignments
. Now executor will verify it only
executes own tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't actually verify this yet, we're now only checking that every group of our own tasks is completely executed. Previously we were checking that every group of any executor's tasks is completely executed so technically this is now a weaker test.
I think this would be a good property to verify, but to do this we also need to say that every task we're executing belongs to one of our own groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added assertThat("All tasks should belong to this executor", totalCount, equalTo(tasks.size()));
to verify that every single task from the list belongs to the current executor.
There is also the a check in the very end of the test that assertEquals(executor.assigned.get(), executor.executed.get())
that verifies that all assigned tasks are executed. We also have a check that no task is executed twice.
I believe this should cover everything, is it?
@elasticmachine please run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a few cosmetic suggestions and one request.
server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/service/MasterServiceTests.java
Outdated
Show resolved
Hide resolved
|
||
TaskExecutor(List<Set<Task>> taskGroups) { | ||
this.taskGroups = taskGroups; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't actually verify this yet, we're now only checking that every group of our own tasks is completely executed. Previously we were checking that every group of any executor's tasks is completely executed so technically this is now a weaker test.
I think this would be a good property to verify, but to do this we also need to say that every task we're executing belongs to one of our own groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
assertThat("All tasks should belong to this executor", totalCount, equalTo(tasks.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat :)
Today node removal tasks executed by the master have a separate
ClusterStateTaskListener to feed back the result to the requester.
It'd be preferable to use the task itself as the listener.
Rel: #82644