[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

Rishikesh1159 · 2022-11-20T06:03:50Z

Describe the bug
With Segment Replication enabled when a new replica shard is recovered/created/added to an existing cluster, then replica shards don't get a checkpoint (latest segments) from primary until an operation is performed on index. So, replica will fall behind until a new operation happens on index.

Explanation:
-> In Ideal Segment Replication scenario, when a refresh happens on index and if a new reference is opened (happens only after some operation on index) then primary shard publishes checkpoint to replicas and send segment files for replica's to catch up.

-> But in case of new replica shards added to existing cluster, replicas don't receive any checkpoint from primary until an operation (index/update/delete) happens on index. Even if we manually refresh the index, a new reference will not opened until an operation (index/update/delete) happens on index and checkpoint is never published from primary to replica. So replica will fall behind.

To Reproduce
Steps to reproduce the behavior:

Start a cluster and create a new index with a primary shard.
Insert some documents into the index
Add new replica shard to existing cluster.
Search for docs inserted in step 2 on new replica.
Search on new replica will return empty even though documents are inserted successfully and present on primary.

Expected behavior
-> Search for documents on replica should not be empty if they are successfully inserted before.

Expected Solution
-> In segment replication when a new replica shard is added to existing cluster, it goes through process of peer recovery and finally mark it as STARTED.
-> After peer recovery is completed and before shard is marked as STARTED, we have to force new replica shard to start a round of replication (segment replication) to fetch latest segment files from primary shard. Then after this replication event is completed then we should mark the shard as STARTED.
-> This way replica shard will have all the latest segment files before it is STARTED and ready to be searched.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

mch2 · 2022-11-21T18:45:22Z

@Rishikesh1159 This is because phase 1 of recovery copies from the primary's latest safe commit. Any segments created after that safe commit will not be copied with recovery & will be copied on the first replication event after the replica is started.

I think its reasonable to force a round of replication here so we are not dependent on the primary receiving consistent index load & refreshing. I think we could do this by triggering a round of segrep when RecoveryListener resolves before IndicesClusterStateService marks the shard as active.

Rishikesh1159 · 2022-11-22T04:13:27Z

Thanks @mch2, yes what you said is correct. Forcing a round of replication while recovering/creating new replica shard makes sense and would solve this bug. I see two possible solution to force segment replication during recovery:

As you mentioned we can use recoveryListener to trigger replication event, I implemented this solution with PR: [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332
Another possible way is to trigger a publish checkpoint from primary when the finalize recovery step of replica shard is completed. We can this below piece of code block after this line in RecoverySourceHandler:

if(shard.indexSettings().isSegRepEnabled() && request.isPrimaryRelocation() ==false){
                shard.sendCheckpoint(shard);
            }

and add IndexShard with:

public void sendCheckpoint(IndexShard recoverySource){
        this.checkpointPublisher.publish(recoverySource);
    }

For now I am going with solution 1 which you suggested. If needed we can discuss solution 2 and use it instead.

Rishikesh1159 added bug Something isn't working untriaged distributed framework and removed untriaged labels Nov 20, 2022

xuezhou25 assigned Rishikesh1159 Nov 21, 2022

Rishikesh1159 mentioned this issue Nov 22, 2022

[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332

Merged

6 tasks

Rishikesh1159 closed this as completed in #5332 Dec 12, 2022

Rishikesh1159 mentioned this issue Dec 12, 2022

[Backport 2.x] [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5533

Merged

6 tasks

dreamer-89 mentioned this issue Jan 26, 2023

[Segment Replication] Revisit network timeout settings to avoid timeout exceptions #6027

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

Rishikesh1159 commented Nov 20, 2022

mch2 commented Nov 21, 2022

Rishikesh1159 commented Nov 22, 2022 •

edited

Loading

[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

Comments

Rishikesh1159 commented Nov 20, 2022

mch2 commented Nov 21, 2022

Rishikesh1159 commented Nov 22, 2022 • edited Loading

Rishikesh1159 commented Nov 22, 2022 •

edited

Loading