Skip to content

[BUG] BWC Rolling upgrade tests fails for SparseEncoder Processor during Batch Ingestion #1142

@vibrantvarun

Description

@vibrantvarun

What is the bug?

BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow test is failing with the following error.

> Task :qa:rolling-upgrade:testAgainstOneThirdUpgradedCluster
REPRODUCE WITH: ./gradlew ':qa:rolling-upgrade:testAgainstOneThirdUpgradedCluster' --tests "org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow" -Dtests.seed=801B4B74838557A -Dtests.security.manager=false -Dtests.bwc.version=2.19.0-SNAPSHOT -Dtests.locale=sw-TZ -Dtests.timezone=America/Bahia_Banderas -Druntime.java=21
Suite: Test class org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT
  2> Jan 24, 2025 6:12:44 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
  2> WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
  2> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
  2> SLF4J: Defaulting to no-operation (NOP) logger implementation
  2> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
  2> REPRODUCE WITH: ./gradlew ':qa:rolling-upgrade:testAgainstOneThirdUpgradedCluster' --tests "org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow" -Dtests.seed=801B4B74838557A -Dtests.security.manager=false -Dtests.bwc.version=2.19.0-SNAPSHOT -Dtests.locale=sw-TZ -Dtests.timezone=America/Bahia_Banderas -Druntime.java=21
  2> java.lang.AssertionError: expected:<10> but was:<8>
        at __randomizedtesting.SeedInfo.seed([801B4B74838557A:8487AE9ACE1A1513]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.neuralsearch.BaseNeuralSearchIT.validateDocCountAndInfo(BaseNeuralSearchIT.java:1535)
        at org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow(BatchIngestionIT.java:46)
  2> NOTE: leaving temporary files on disk at: /home/runner/work/neural-search/neural-search/qa/rolling-upgrade/build/testrun/testAgainstOneThirdUpgradedCluster/temp/org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT_801B4B74838557A-001
  2> NOTE: test params are: codec=Asserting(Lucene912): {}, docValues:{}, maxPointsInLeafNode=22, maxMBSortInHeap=5.7175746038185356, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=sw-TZ, timezone=America/Bahia_Banderas
  2> NOTE: Linux 6.8.0-1020-azure amd64/Azul Systems, Inc. 21.0.6 (64-bit)/cpus=4,threads=3,free=453753024,total=536870912
  2> NOTE: All tests run in this JVM: [BatchIngestionIT]
BatchIngestionIT > testBatchIngestion_SparseEncodingProcessor_E2EFlow FAILED
    java.lang.AssertionError: expected:<10> but was:<8>
        at __randomizedtesting.SeedInfo.seed([801B4B74838557A:8487AE9ACE1A1513]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.neuralsearch.BaseNeuralSearchIT.validateDocCountAndInfo(BaseNeuralSearchIT.java:1535)
        at org.opensearch.neuralsearch.bwc.rolling.BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow(BatchIngestionIT.java:46)
  1> [2025-01-24T12:12:44,811][INFO ][o.o.n.b.r.BatchIngestionIT] [testBatchIngestion_SparseEncodingProcessor_E2EFlow] before test
  1> [2025-01-24T12:12:45,122][INFO ][o.o.n.b.r.BatchIngestionIT] [testBatchIngestion_SparseEncodingProcessor_E2EFlow] initializing REST clients against [http://[::1]:[410](https://github.com/opensearch-project/neural-search/actions/runs/12942873461/job/36138496383?pr=1140#step:4:411)85, http://127.0.0.1:40513, http://[::1]:35229, http://127.0.0.1:41063, http://[::1]:41591, http://127.0.0.1:34731]
  1> [2025-01-24T12:12:49,676][INFO ][o.o.n.b.r.BatchIngestionIT] [testBatchIngestion_SparseEncodingProcessor_E2EFlow] There are still tasks running after this test that might break subsequent tests [cluster:admin/opensearch/ml/undeploy_model, cluster:admin/opensearch/mlinternal/syncup, indices:admin/seq_no/global_checkpoint_sync, indices:admin/seq_no/global_checkpoint_sync[p], indices:data/write/bulk, indices:data/write/bulk[s]].
  1> [2025-01-24T12:12:49,722][INFO ][o.o.n.b.r.BatchIngestionIT] [testBatchIngestion_SparseEncodingProcessor_E2EFlow] after test

How can one reproduce the bug?

Run the test locally or raise a PR on neural to see it in github CI check.

What is the expected behavior?

Test should pass successfully.

Do you have any additional context?

https://github.com/opensearch-project/neural-search/actions/runs/12942873461/job/36138496383?pr=1140

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions