Description
When running the KNN benchmark with -parentJoin enabled and using metadata downloaded via initial_setup.py -download, the benchmark fails during brute-force KNN evaluation due to an index size mismatch.
Full output:
Parent join metaFile columns: wiki_id | para_id
indexed 25000 child documents, with 276 parents
indexed 50000 child documents, with 592 parents
indexed 75000 child documents, with 949 parents
indexed 100000 child documents, with 1322 parents
indexed 125000 child documents, with 1725 parents
indexed 150000 child documents, with 2107 parents
indexed 175000 child documents, with 2527 parents
indexed 200000 child documents, with 2938 parents
indexed 225000 child documents, with 3379 parents
indexed 250000 child documents, with 3803 parents
indexed 275000 child documents, with 4281 parents
indexed 300000 child documents, with 4734 parents
indexed 325000 child documents, with 5193 parents
indexed 350000 child documents, with 5696 parents
indexed 375000 child documents, with 6166 parents
indexed 400000 child documents, with 6697 parents
indexed 425000 child documents, with 7198 parents
indexed 450000 child documents, with 7734 parents
indexed 475000 child documents, with 8250 parents
indexed 500000 child documents, with 8772 parents
Indexed 500000 documents with 8772 parent docs. now flush
now IndexWriter.commit()
done ConcurrentMergeScheduler.sync()
Indexed 500000 docs in 48 seconds
reindex takes 48.69 sec
index has 4 segments: segments_5: _b(11.0.0):C406191:[diagnostics={source=merge, os.arch=aarch64, java.runtime.version=24.0.1+9-30, mergeFactor=10, java.vendor=Oracle Corporation, os=Mac OS X, os.version=14.5, timestamp=1750693105755, mergeMaxNumSegments=-1, lucene.version=11.0.0}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1vo1e9gh144p3y5s9lpb0f0wx _1(11.0.0):C40534:[diagnostics={source=flush, lucene.version=11.0.0, os.version=14.5, os.arch=aarch64, java.vendor=Oracle Corporation, os=Mac OS X, java.runtime.version=24.0.1+9-30, timestamp=1750693080822}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1vo1e9gh144p3y5s9lpb0f0vw _c(11.0.0):C40771:[diagnostics={source=flush, lucene.version=11.0.0, os.version=14.5, os.arch=aarch64, java.vendor=Oracle Corporation, os=Mac OS X, java.runtime.version=24.0.1+9-30, timestamp=1750693108866}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1vo1e9gh144p3y5s9lpb0f0ws _d(11.0.0):C21277:[diagnostics={source=flush, lucene.version=11.0.0, os.version=14.5, os.arch=aarch64, java.vendor=Oracle Corporation, os=Mac OS X, java.runtime.version=24.0.1+9-30, timestamp=1750693110354}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=1vo1e9gh144p3y5s9lpb0f0wv
_b: 1176.9 MB
_1: 117.8 MB
_c: 117.8 MB
_d: 61.4 MB
encodingByteSize=4 origByteSize=4
realEncodingByteSize=4.0
index disk usage is 1473.91 MB
vector disk usage is 1464.84 MB
vector RAM usage is 1464.84 MB
now compute brute-force exact KNN matches
computing true nearest neighbors of 1000 target vectors
parentJoin = true
now compute brute-force KNN hits for 1000 query vectors from "/Users/lukewilner/lucene-bench/data/cohere-wikipedia-queries-768d.vec" starting at query index 0
Exception in thread "main" java.lang.IllegalStateException: index size mismatch, expected 500000 but index has 508773