Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

dbwiddis · 2022-08-19T21:05:25Z

Signed-off-by: Daniel Widdis widdis@gmail.com

Description

From Lucene's IndexSearcher javaDoc:

NOTE: The search(org.apache.lucene.search.Query, int) and searchAfter(org.apache.lucene.search.ScoreDoc, org.apache.lucene.search.Query, int) methods are configured to only count top hits accurately up to 1,000 and may return a lower bound of the hit count if the hit count is greater than or equal to 1,000. On queries that match lots of documents, counting the number of hits may take much longer than computing the top hits so this trade-off allows to get some minimal information about the hit count without slowing down search too much.

The AbstractStringFieldDataTestCase assumed the totalHits value was exact.

This fixes the test to test for either exact accuracy or lower bound, as appropriate.

Issues Resolved

Fixes #4238

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Daniel Widdis <widdis@gmail.com>

owaiskazi19 · 2022-08-19T21:08:26Z

server/src/test/java/org/opensearch/index/fielddata/AbstractStringFieldDataTestCase.java

+        if (topDocs.totalHits.relation == TotalHits.Relation.EQUAL_TO) {
+            assertEquals(numDocs, topDocs.totalHits.value);            
+        } else {
+            assertTrue(numDocs >= topDocs.totalHits.value);                        


Can't we just have the below

assertTrue(numDocs >= topDocs.totalHits.value);

rather the whole condition?

In the test, our total documents (numDocs) is not always over 1000, and so it is more precise to test for equality in the 1/3 of cases where a more accurate result will apply.

OpenSearch/server/src/test/java/org/opensearch/index/fielddata/AbstractStringFieldDataTestCase.java

Line 261 in 05a5819

final int numDocs = scaledRandomIntBetween(10, 3072);

This will protect against a bug where totalHits is always some small constant like 1, which would pass the lower bound test.

(I might argue we could add another assertion that totalHits.value >= 1000 in this case, but that might be overkill.)

Agree it's overkill...

owaiskazi19 · 2022-08-19T21:12:54Z

Pre-commit failing because of the usual Spotless

Execution failed for task ':server:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/index/fielddata/AbstractStringFieldDataTestCase.java
          @@ -343,9 +343,9 @@
           ········);
           ········//·As·of·Lucene·9.0.0,·totalHits·may·be·a·lower·bound
           ········if·(topDocs.totalHits.relation·==·TotalHits.Relation.EQUAL_TO)·{
          -············assertEquals(numDocs,·topDocs.totalHits.value);············
          +············assertEquals(numDocs,·topDocs.totalHits.value);
           ········}·else·{
          -············assertTrue(numDocs·>=·topDocs.totalHits.value);························
          +············assertTrue(numDocs·>=·topDocs.totalHits.value);
           ········}
           ········BytesRef·previousValue·=·first·?·null·:·reverse·?·UnicodeUtil.BIG_TERM·:·new·BytesRef();
           ········for·(int·i·=·0;·i·<·topDocs.scoreDocs.length;·++i)·{

Signed-off-by: Daniel Widdis <widdis@gmail.com>

github-actions · 2022-08-19T21:24:16Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/1949/
CommitID: 0ec0651

github-actions · 2022-08-19T21:48:01Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/1950/
CommitID: c497919

nknize

LGTM! Thx for adding this check @dbwiddis! I had completely forgot about this change.

heemin32 · 2022-10-21T01:14:44Z

Shouldn't we back port this PR to 2.x branch?

dblock · 2022-10-21T14:02:24Z

I labeled it to backport, let's see.

…wer bound (#4270) Fixes tests to account for TotalHits uncertainty as of Lucene 9. Signed-off-by: Daniel Widdis <widdis@gmail.com> (cherry picked from commit 4643620)

…or TotalHits lower bound (#4867) * Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound (#4270) Fixes tests to account for TotalHits uncertainty as of Lucene 9. Signed-off-by: Daniel Widdis <widdis@gmail.com> (cherry picked from commit 4643620) * Added CHANGELOG. Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Co-authored-by: Daniel Widdis <widdis@gmail.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

dblock · 2022-10-21T15:55:52Z

@dbwiddis Looking at the implementation here: not in love with conditions in specs. When is it one or the other?

dbwiddis · 2024-10-19T17:34:57Z

@dbwiddis Looking at the implementation here: not in love with conditions in specs. When is it one or the other?

@dblock answering nearly two years later, sorry. The initial PR comment javadoc explains the difference. Lucene can compute the hit count and if it's > 1000 it makes it an approximate bound, indicated by the enum used for the conditional.

Fix tests to account for TotalHits uncertainty

0ec0651

Signed-off-by: Daniel Widdis <widdis@gmail.com>

dbwiddis requested review from a team and reta as code owners August 19, 2022 21:05

owaiskazi19 reviewed Aug 19, 2022

View reviewed changes

Add another assertion and fix formatting

c497919

Signed-off-by: Daniel Widdis <widdis@gmail.com>

dbwiddis requested a review from owaiskazi19 August 19, 2022 21:23

owaiskazi19 approved these changes Aug 19, 2022

View reviewed changes

owaiskazi19 requested review from nknize and kartg August 19, 2022 22:11

nknize approved these changes Aug 21, 2022

View reviewed changes

nknize merged commit 4643620 into opensearch-project:main Aug 21, 2022

dbwiddis deleted the totalHitsLowerBound branch August 22, 2022 15:08

This was referenced Oct 20, 2022

[Backport 2.x]Support of GeoJson Point for GeoPoint field (#4597) #4842

Merged

[BUG] Failing test: org.opensearch.index.fielddata.SortedSetDVStringFieldDataTests.testSortMissingLastReverse #4861

Closed

dblock added the backport 2.x Backport to 2.x branch label Oct 21, 2022

opensearch-trigger-bot bot mentioned this pull request Oct 21, 2022

[Backport 2.x] Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4867

Merged

dbwiddis mentioned this pull request Oct 19, 2024

[BUGFIX] Fix missing fields to resolve Strict Dynamic Mapping issue when saving task result #16201

Merged

3 tasks

This was referenced Oct 22, 2024

Fix flaky test in testApproximateRangeWithSizeOverDefault by adjusting totalHits assertion logic #16433

Closed

Fix flaky test in testApproximateRangeWithSizeOverDefault by adjusting totalHits assertion logic #16434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

dbwiddis commented Aug 19, 2022

owaiskazi19 Aug 19, 2022 •

edited

Loading

dbwiddis Aug 19, 2022

dbwiddis Aug 19, 2022

nknize Aug 21, 2022

owaiskazi19 commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

nknize left a comment

heemin32 commented Oct 21, 2022

dblock commented Oct 21, 2022

dblock commented Oct 21, 2022

dbwiddis commented Oct 19, 2024 •

edited

Loading

Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

Conversation

dbwiddis commented Aug 19, 2022

Description

Issues Resolved

Check List

owaiskazi19 Aug 19, 2022 • edited Loading

Choose a reason for hiding this comment

dbwiddis Aug 19, 2022

Choose a reason for hiding this comment

dbwiddis Aug 19, 2022

Choose a reason for hiding this comment

nknize Aug 21, 2022

Choose a reason for hiding this comment

owaiskazi19 commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Aug 19, 2022

Gradle Check (Jenkins) Run Completed with:

nknize left a comment

Choose a reason for hiding this comment

heemin32 commented Oct 21, 2022

dblock commented Oct 21, 2022

dblock commented Oct 21, 2022

dbwiddis commented Oct 19, 2024 • edited Loading

owaiskazi19 Aug 19, 2022 •

edited

Loading

dbwiddis commented Oct 19, 2024 •

edited

Loading