Skip to content

Perform more startup consistency checks before writing anything to disk #44624

Open
@DaveCTurner

Description

@DaveCTurner

Today when a node starts up after an upgrade it might write upgraded versions of at least these separate structures to disk:

  • the keystore
  • the node metadata file
  • (some portion of) the cluster metadata

None of these have forwards-compatible representations, and all of them are loaded, checked, and then rewritten independently. This can potentially get a node completely stuck in an upgrade:

  • if the node metadata file is invalid (e.g. comes from a version that is too old to support an in-place upgrade) then we do not discover this until after upgrading the keystore to the latest version. This version of the node cannot start up due to the invalid node metadata file, but an attempt to downgrade to the previous working version will also fail because of the upgraded keystore.

  • if the cluster metadata is invalid (e.g. contains an index from an unsupported version) then we do not discover this until after upgrading the keystore and the node metadata files to the latest versions. Again, this version of the node cannot start up due to the invalid cluster metadata, but an attempt to downgrade to the previous working version will also fail because of the upgraded keystore and node metadata files.

One common path into this kind of situation is by upgrading without first getting a clean bill of health from the upgrade assistant.

We can make this experience better by performing more consistency checks before writing anything to disk at startup, to avoid blocking a subsequent downgrade in cases where the upgrade is obviously infeasible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions