Skip to content

Shard history retention leases #37165

Closed
Closed
@jasontedor

Description

@jasontedor

When a shard of a follower index is consuming shard history from its corresponding shard of its leader index, it could be that the history operations is no longer available on any of the leader shard copies. This can happen if some operations were soft deleted and subsequently merged away before the shard of the following index had a chance to replicate these operations. This has catastrophic consequences for the follower index though as now the only option for the follower index to recover is a full file-based recovery. In the context of cross-cluster replication, this can potentially be over a WAN with limited networking resources. During this file-based recovery, the follower index becomes unavailable, defeating the purpose of being an available copy of the leader index in another cluster.

One idea towards solving this problem is for the shard of a follower index to be able to leave a marker on the corresponding shard of its leader index to notate where in shard history the following shard is. This marker would prevent any operations with sequence number at least at that marker from being eligible to be merged away.

And thus was born the idea of shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API.

Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem.

Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene a dedicated state file and restored during recovery, and broadcast to replicas under certain circumstances.

This issue is a meta-issue for tracking the progress of implementing shard history retention leases. We will proceed with implementing shard history retention leases along the following rough plan:

Metadata

Metadata

Labels

:Data Management/ILM+SLMIndex and Snapshot lifecycle management:Distributed Indexing/CCRIssues around the Cross Cluster State Replication features:Distributed Indexing/DistributedA catch all label for anything in the Distributed Indexing Area. Please avoid if you can.:Distributed Indexing/RecoveryAnything around constructing a new shard, either from a local or a remote source.>featureMeta

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions