Skip to content

Decouple Replication lag from logic to mark replicas as stale #8453

Closed
@ankitkala

Description

For Segrep, we rely on replication lag to determine if writes needs to be throttled or if replica needs to be marked as stale. With Segrep's integration with remote store, this will include the time taken by primary to upload the segments to remote store as well. This makes sense and it should be accounted in the replication lag. However this can create problem with replicas if the segment upload time is high (e.g. merges). We shouldn't be kicking out the replicas due to high segment upload times to remote store. This can further aggravate the situation in case of a large scale event(let's say with remote store) by kicking out all the replicas from a cluster.

We need to decouple the logic for marking replicas as stale from replication lag.

Metadata

Assignees

Labels

StorageIssues and PRs relating to data and metadata storagebugSomething isn't workingdiscussIssues intended to help drive brainstorming and decision makingdistributed frameworkv2.10.0

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions