[Segment Replication & Remote Translog] Back-pressure and Recovery for lagging replica copies

**Is your feature request related to a problem? Please describe.**
Once we enable [segment based replication](https://github.com/opensearch-project/OpenSearch/issues/2229) for an index, we wouldn't be indexing any operation on the replica(just writing to translog for durability). Just by virtue of having a successful write to translog we would assume that the replica is caught up. However, since no indexing operation is applied on replicas except the segments on checkpoint refresh, it's possible that the replica may not have successfully processed the checkpoint for a while due to shard overload/slow I/O would still be serving reads. 
Currently there are no additional mechanisms(once translog has been written on the replica) to apply back pressure on primary if the replica is slow in processing checkpoints which would be aggravated with remote translog since there wouldn't be any I/O on replica at all since remote translog writes on primary will handle durability altogether.


**Describe the solution you'd like**
Need to support mechanisms to apply back pressure and as a last resort fail the replica copy if its unable to process any further checkpoint beyond a threshold

**Describe alternatives you've considered**

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication & Remote Translog] Back-pressure and Recovery for lagging replica copies #4478

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development