Description
Elasticsearch Version
master
Installed Plugins
N/A
Java Version
N/A
OS Version
N/A
Problem Description
Today the reconfigure
cluster state update task blocks on Coordinator#mutex
which might be held by a long-running task:
I believe it only does this so it can safely access CoordinationState#joinVotes
as part of the computation to determine which nodes are live. It should be fine to use a possibly-stale vote collection instead, captured when the task is submitted, as long as we also check that the term didn't change between submitting and executing the task.
Additionally the master service blocks on Coordinator#mutex
when retrieving the current cluster state against which it will execute some tasks:
That's probably not a big deal, the main long-running task is writing out the cluster state during a publication when the master service wouldn't be executing tasks anyway.
Steps to Reproduce
N/A
Logs (if relevant)
No response