Decouple Replication lag from logic to mark replicas as stale

For Segrep, we rely on replication lag to determine if writes needs to be throttled or if replica needs to be marked as stale. With Segrep's integration with remote store, this will include the time taken by primary to upload the segments to remote store as well. This makes sense and it should be accounted in the replication lag. However this can create problem with replicas if the segment upload time is high (e.g. merges). We shouldn't be kicking out the replicas due to high segment upload times to remote store. This can further aggravate the situation in case of a large scale event(let's say with remote store) by kicking out all the replicas from a cluster.

We need to decouple the logic for marking replicas as stale from replication lag. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple Replication lag from logic to mark replicas as stale #8453

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development