Description
Recently metrics for Segment Replication were added to index/node/cluster level APIs. The metrics include a max bytes behind, max replication lag, and total bytes behind at each of these levels.
These metrics are computed by the primary shard for each replication group within its ReplicationTracker. These metrics were intended to be used to apply backpressure when the primary identifies its replicas is falling behind. Using these metrics means that when rolled up they are not representative of their label. For example - At a node level, bytes behind metrics will actually be the max/total bytes ahead the primaries that exist on that node are compared to their replicas that are distributed across the cluster. To identify lagging nodes, this is not the correct metric to use and is misleading.
I propose we rename these metric labels appropriately and add new metrics for bytes behind that is computed from the replica's perspective. We can compute them by:
- Store on replicas received checkpoints from the primary
- Start a timer for each checkpoint
- Clear the timers once replicas complete a sync
- Compute replication stats per replica with these two fields - bytes behind can be computed from the metadata sent in the latest received checkpoint, while the lag is the ongoing time of the earliest received checkpoint.
In doing this, we will have two sets of metrics - one set computed from a replica's perspective according to its latest received checkpoint which means it does not account for the time taken to publish checkpoints and another from the primary's perspective according to its latest refreshed checkpoint which accounts for publish time.
Metadata
Metadata
Assignees
Type
Projects
Status
🆕 New
Activity