Skip to content

Retain history for peer recovery using leases #41536

Closed
@DaveCTurner

Description

@DaveCTurner

The goal is that we can perform an operations-based recovery for all "reasonable" shard copies C:

  • There is a peer recovery retention lease L corresponding with C.
  • Every in-sync shard copy has a complete history of operations above the retained seqno of L.
  • The retained seqno r of L is no greater than the local checkpoint of the last safe commit of C.

Reasonable shard copies comprise all the copies that are currently being tracked, as well as all the copies that "might be a recovery target": if the shard is not fully allocated then any copy that has been tracked in the last index.soft_deletes.retention_lease.period (i.e. 12h) might reasonably be a recovery target.

We also require that history is eventually released: in a stable cluster, for every operation with seqno s below the MSN of a replication group, eventually there are no leases that retain s:

  • Every active shard copy eventually advances its LCPoSC past s.
  • Every lease for an active shard copy eventually also passes s.
  • Every inactive shard copy eventually either becomes active or else its lease expires.

Concretely, this should ensure that operations-based recoveries are possible in the following cases (subject to the copy being allocated back to the same node):

  • a shard copy C is offline for a short period (<12h)
    • even if the primary is relocated or a replica is promoted to primary while C is offline.
    • even if C was part of a closed/frozen/readonly index that was opened while C was offline
      • but not if the index was closed/frozen again before C comes back
      • TBD: maybe we are ok with this being a file-based recovery?
  • a full-cluster restart

This breaks into a few conceptually-separate pieces:

Followup work, out-of-scope for the feature branches.

  • Adjust translog retention

    • Should we retain translog generations according to retention leases too?
    • Trim translog files eagerly during the "verify-before-close" step for closed/frozen indices (Trim translog for closed indices #43156)
    • Properly support peer-recovery retention leases on indices that are not using soft deletes too.
  • Make the ReplicaShardAllocator sensitive to leases, so that it prefers to select a location for each replica that only needs an ops-based recovery. (relates Replica allocation consider no-op #42518)

  • Seqno-based synced flush: if a copy has LCP == MSN then it needs no recovery. (relates Replica allocation consider no-op #42518)


BWC issues: during a rolling upgrade, we may migrate a primary onto a new node without first establishing the appropriate leases. They can't be established before or during this promotion, so we must weaken the assertions so that they only apply to sufficiently-newly-created indices. We will still establish leases properly during peer recovery, and can establish them lazily on older indices, but they may not retain all the right history when first created.

Closed replicated indices issues: a closed index permits no replicated actions, but should not need any history to be retained. We cannot replay history into a closed index, so all recoveries must be file-based, so there's no real need for leases; moreover any existing PRRLs will not be retaining any history. We cannot assert that all the copies of a replicated closed index have a corresponding lease without performing replicated write actions to create such leases as we create new replicas, and nor can we assert that there are no leases on a replicated closed index since again this would require replicated write actions. We elect to ignore PRRLs on closed indices: they might exist, but they might not, and either way is fine.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions