Skip to content

[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy #42979

Closed
@henningandersen

Description

@henningandersen

My PR build failed:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/193/testReport/org.elasticsearch.index.engine/InternalEngineTests/testLookupSeqNoByIdInLucene/

with:

java.lang.NullPointerException
	at __randomizedtesting.SeedInfo.seed([DDA237697902E2F7:858F7759E359C12]:0)
	at org.elasticsearch.index.engine.InternalEngineTests.lambda$testLookupSeqNoByIdInLucene$50(InternalEngineTests.java:4004)
	at org.elasticsearch.index.engine.InternalEngineTests.testLookupSeqNoByIdInLucene(InternalEngineTests.java:4031)

I can easily reproduce this with the seed DDA237697902E2F7. I cut down the test case to following (which indexes a doc, does a refresh, deletes the doc and then merges, loosing the ability to lookup the doc/seqno by ID):

    public void testLookupSeqNoByIdInLucene2() throws Exception {
        Settings.Builder settings = Settings.builder()
            .put(defaultSettings.getSettings())
            .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true);
        final IndexMetaData indexMetaData = IndexMetaData.builder(defaultSettings.getIndexMetaData()).settings(settings).build();
        final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetaData);
        Map<String, Engine.Operation> latestOps = new HashMap<>(); // id -> latest seq_no
        try (Store store = createStore();
             InternalEngine engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null))) {
            final ParsedDocument doc = EngineTestCase.createParsedDoc("23", null);
            engine.index(new Engine.Index(EngineTestCase.newUid(doc), doc, 1, primaryTerm.get(),
                1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), -1, true, UNASSIGNED_SEQ_NO, 0L));
            engine.refresh("test");
            engine.delete(new Engine.Delete(doc.type(), doc.id(), EngineTestCase.newUid(doc), 3, primaryTerm.get(),
                1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), UNASSIGNED_SEQ_NO, 0L));
            try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
                logger.info("before merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
            }
            engine.forceMerge(true);
            try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
                logger.info("after merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
                DocIdAndSeqNo docIdAndSeqNo = VersionsAndSeqNoResolver.loadDocIdAndSeqNo(searcher.reader(), newUid("23"));
                assertNotNull(docIdAndSeqNo);
            }
        }
    }

I tried disabling the new PrunePostingsMergePolicy and this made the problem go away. As far as I can see, we do use this lookup by ID in InternalEngine.planIndexingAsNonPrimary and planDeletionAsNonPrimary in the case where the seqNo received is below local checkpoint.

I am unsure if this part can be removed now and simply always treat all ops below local checkpoint as stale? Therefore raising this issue to gather input.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions