Description
Is your feature request related to a problem? Please describe.
Background
Cross Cluster Replication feature follows logical replication model. Each follower pulls the latest operations from the corresponding shards in the leader index and replays them at the follower side. In order to maintain the latest operations at the leader cluster, retention leases are acquired at the leader index. Once the operations are replayed at the follower side, these retention leases are renewed. Existing peer-recovery infrastructure is leveraged and extended for cross cluster replication feature.
Problem
Retention leases preserves operations for each shard at Lucene level (used as part of peer recovery within the cluster).
During performance benchmarking for the replication feature (for high indexing workloads), the fetch for the latest operations from the leader cluster has seen an impact on CPU (of up to ~8-10%) due to Lucene stored fields decompression.
Describe the solution you'd like
Solution
All the latest operations are available under translog in uncompressed form. Currently, translog doesn't have the mechanism to prune the operations based on the retention lease. If translog pruning takes into account retention leases as well, then the fetch operations can directly serve the requests from translog saving CPU cycles.
Details
- Introduce a new dynamic setting at index level to prune translog operations based on retention lease.
- For the indices with this setting enabled, translog deletion policy is updated to take retention leases into account. This ensures that the operations upto certain threshold are available as part of the translog and fetch operations doesn't have to query lucene for these operations.
Describe alternatives you've considered
N/A
Additional context
N/A
Activity