Skip to content

Master thread blocks on Coordinator#mutex #83682

Open
@DaveCTurner

Description

@DaveCTurner

Elasticsearch Version

master

Installed Plugins

N/A

Java Version

N/A

OS Version

N/A

Problem Description

Today the reconfigure cluster state update task blocks on Coordinator#mutex which might be held by a long-running task:

public ClusterState execute(ClusterState currentState) {
reconfigurationTaskScheduled.set(false);
synchronized (mutex) {
return improveConfiguration(currentState);
}
}

I believe it only does this so it can safely access CoordinationState#joinVotes as part of the computation to determine which nodes are live. It should be fine to use a possibly-stale vote collection instead, captured when the task is submitted, as long as we also check that the term didn't change between submitting and executing the task.

Additionally the master service blocks on Coordinator#mutex when retrieving the current cluster state against which it will execute some tasks:

ClusterState getStateForMasterService() {
synchronized (mutex) {

That's probably not a big deal, the main long-running task is writing out the cluster state during a publication when the master service wouldn't be executing tasks anyway.

Steps to Reproduce

N/A

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>tech debtTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions