Skip to content

Uses auto generated timestamp with soft-deletes #33656

Closed
@dnhatn

Description

@dnhatn

1. Peer-recovery

Today we don't store the auto-generated timestamp of indexing operations in Lucene and always assign -1 to all index operations from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica in the following test.

public void testRetryAppendOnlyInRecoveryAndReplication() throws Exception {
    Settings settings = Settings.builder()
        .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
        .build();
    try (ReplicationGroup shards = createGroup(0, settings)) {
        shards.startAll();
        final IndexRequest originalRequest = new IndexRequest(
            index.getName(), "type").source("{}", XContentType.JSON);
        originalRequest.process(Version.CURRENT, null, index.getName());
        IndexRequest retryRequest = new IndexRequest();
        try (BytesStreamOutput out = new BytesStreamOutput()) {
            originalRequest.writeTo(out);
            try (StreamInput in = out.bytes().streamInput()) {
                retryRequest.readFrom(in);
            }
        }
        retryRequest.onRetry();
        shards.index(retryRequest);
        IndexShard replica = shards.addReplica();
        shards.recoverReplica(replica); // timestamp on replica is -1
        shards.assertAllEqual(1);
        shards.index(originalRequest); // we optimize this request on replica
        shards.assertAllEqual(1);
    }
}

To fix this, we need to assign a value which is at least the (original) timestamp of the index request to its corresponding index request from LucenChangeSnapshot. Here we can use the latest auto-generated timestamp of Engine.

2. Optimize indexing on a FollowingEngine in CCR

We disable optimization for index requests whose origin are recovery (retry always is true). To enable this optimization in CCR:

  1. We need to make sure that a FollowingEngine processes an append-only operation once. This can be done using LocalCheckpointTracker.

  2. We need to store the retry flag to Lucene index and extend Translog#Index to include this flag. This should be fast with a single value DocValues.

@s1monw WDYT? /cc @bleskes

This a subtask of #30086.

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/CCRIssues around the Cross Cluster State Replication features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions