Skip to content

Improve error messages in ccr during rolling upgrade #39230

Open
@martijnvg

Description

@martijnvg

CCR is likely to fail or not work correctly during a rolling upgrade or when clusters are running with mixed node versions. In case of unidirectional index following, if the follow cluster is upgraded before the leader cluster, index following will remain to function while a rolling upgrade is performed. However if the leader cluster is upgraded before the follower cluster or in case of bi-directional index followiong, index following may fail during a rolling upgrade.

During a rolling upgrade ccr may fail in the following places:

  • Index settings and mappings may be replicated from the leader cluster to the follower cluster, that the follower cluster does not support yet, because it runs an older Elasticseach version than the the leader cluster. Currently the shard follow tasks will fail with a non retryable error, which can be inspected in the ccr and follow stats api. These errors may not immediately indicate that the underlying problem is that clusters are running with mixed versions. For shard follow tasks may fail with able MapperParsingException exception if a mapper field type does not exist.
  • When the put follow api restores the leader index shards into the follower index shards, the nodes in the follow cluster may not be able to read the Lucene index, because the nodes in the follow cluster have not been updated yet. Also bootstrapping the follower index can fail, because the leader index contains mappings and settings, that the follower cluster doesn't yet understand. In any case, the errors (IndexFormatTooNewException) that may occur during put follow api executiong, are not very descriptive.

In order to avoid the above non descriptive errors, ccr should always throw a descriptive error, indicating that the actual problem occured, because not all nodes in the current and remote clusters have been upgraded.

Also the put follow api, should refuse to execute if nodes in the current and remote cluster are not on the same Elasticsearch version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CCRIssues around the Cross Cluster State Replication features>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions