Description
1. Peer-recovery
Today we don't store the auto-generated timestamp of indexing operations in Lucene and always assign -1 to all index operations from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica in the following test.
public void testRetryAppendOnlyInRecoveryAndReplication() throws Exception {
Settings settings = Settings.builder()
.put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
.build();
try (ReplicationGroup shards = createGroup(0, settings)) {
shards.startAll();
final IndexRequest originalRequest = new IndexRequest(
index.getName(), "type").source("{}", XContentType.JSON);
originalRequest.process(Version.CURRENT, null, index.getName());
IndexRequest retryRequest = new IndexRequest();
try (BytesStreamOutput out = new BytesStreamOutput()) {
originalRequest.writeTo(out);
try (StreamInput in = out.bytes().streamInput()) {
retryRequest.readFrom(in);
}
}
retryRequest.onRetry();
shards.index(retryRequest);
IndexShard replica = shards.addReplica();
shards.recoverReplica(replica); // timestamp on replica is -1
shards.assertAllEqual(1);
shards.index(originalRequest); // we optimize this request on replica
shards.assertAllEqual(1);
}
}
To fix this, we need to assign a value which is at least the (original) timestamp of the index request to its corresponding index request from LucenChangeSnapshot. Here we can use the latest auto-generated timestamp of Engine.
2. Optimize indexing on a FollowingEngine in CCR
We disable optimization for index requests whose origin are recovery (retry always is true). To enable this optimization in CCR:
-
We need to make sure that a FollowingEngine processes an append-only operation once. This can be done using LocalCheckpointTracker.
-
We need to store the retry flag to Lucene index and extend Translog#Index to include this flag. This should be fast with a single value DocValues.
This a subtask of #30086.