Skip to content

Replica recovery on follower can fail after recovery from remote #39000

Closed
@jasontedor

Description

@jasontedor

Today during peer recovery, we replay operations from the translog of the primary to the replica. This is to build a history of operations on the replica, in case that it would become a primary. Unfortunately, this is currently fatal for peer recovery of a follower shard after the primary shard has recovered remotely.

With recovery from remote, we copy over the index files and there is no translog replay phase. Assume the simple case of a newly created follower doing its initial recovery from the leader. If the leader does a flush before the follower initiates recovery from remote, after recovery from remote the follower will be fully caught up all primary shards of the follower will have empty translogs. When a replica shard of the follower attempts to recover from the primary shard of the follower, we want to replay translog from the local checkpoint of the commit, to bake a history of operations for it. Since the primary shards of the follower have empty translogs, this replay can not happen and recovery will fail.

Immediately I see two possibilities:

  • InternalEngine#readHistoryOperations could serve operations from the Lucene history instead of the translog; however, this interplays with the need for shard history retention leases to interplay with recovery to ensure that these operations exist
  • the primary shards of the follower could build their own translog from Lucene history after remotely recovering files

The purpose of this issue is to tracker this as a blocker for all versions of CCR that support recovery from remote, and to initiate discussion on these options (and others).

Relates #38633

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions