Closed
Description
My PR build failed:
with:
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([DDA237697902E2F7:858F7759E359C12]:0)
at org.elasticsearch.index.engine.InternalEngineTests.lambda$testLookupSeqNoByIdInLucene$50(InternalEngineTests.java:4004)
at org.elasticsearch.index.engine.InternalEngineTests.testLookupSeqNoByIdInLucene(InternalEngineTests.java:4031)
I can easily reproduce this with the seed DDA237697902E2F7. I cut down the test case to following (which indexes a doc, does a refresh, deletes the doc and then merges, loosing the ability to lookup the doc/seqno by ID):
public void testLookupSeqNoByIdInLucene2() throws Exception {
Settings.Builder settings = Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true);
final IndexMetaData indexMetaData = IndexMetaData.builder(defaultSettings.getIndexMetaData()).settings(settings).build();
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetaData);
Map<String, Engine.Operation> latestOps = new HashMap<>(); // id -> latest seq_no
try (Store store = createStore();
InternalEngine engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null))) {
final ParsedDocument doc = EngineTestCase.createParsedDoc("23", null);
engine.index(new Engine.Index(EngineTestCase.newUid(doc), doc, 1, primaryTerm.get(),
1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), -1, true, UNASSIGNED_SEQ_NO, 0L));
engine.refresh("test");
engine.delete(new Engine.Delete(doc.type(), doc.id(), EngineTestCase.newUid(doc), 3, primaryTerm.get(),
1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), UNASSIGNED_SEQ_NO, 0L));
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
logger.info("before merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
}
engine.forceMerge(true);
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
logger.info("after merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
DocIdAndSeqNo docIdAndSeqNo = VersionsAndSeqNoResolver.loadDocIdAndSeqNo(searcher.reader(), newUid("23"));
assertNotNull(docIdAndSeqNo);
}
}
}
I tried disabling the new PrunePostingsMergePolicy
and this made the problem go away. As far as I can see, we do use this lookup by ID in InternalEngine.planIndexingAsNonPrimary
and planDeletionAsNonPrimary
in the case where the seqNo received is below local checkpoint.
I am unsure if this part can be removed now and simply always treat all ops below local checkpoint as stale? Therefore raising this issue to gather input.