Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound #4270

Merged
merged 2 commits into from
Aug 21, 2022

Conversation

dbwiddis
Copy link
Member

Signed-off-by: Daniel Widdis widdis@gmail.com

Description

From Lucene's IndexSearcher javaDoc:

NOTE: The search(org.apache.lucene.search.Query, int) and searchAfter(org.apache.lucene.search.ScoreDoc, org.apache.lucene.search.Query, int) methods are configured to only count top hits accurately up to 1,000 and may return a lower bound of the hit count if the hit count is greater than or equal to 1,000. On queries that match lots of documents, counting the number of hits may take much longer than computing the top hits so this trade-off allows to get some minimal information about the hit count without slowing down search too much.

The AbstractStringFieldDataTestCase assumed the totalHits value was exact.

This fixes the test to test for either exact accuracy or lower bound, as appropriate.

Issues Resolved

Fixes #4238

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis requested review from a team and reta as code owners August 19, 2022 21:05
if (topDocs.totalHits.relation == TotalHits.Relation.EQUAL_TO) {
assertEquals(numDocs, topDocs.totalHits.value);
} else {
assertTrue(numDocs >= topDocs.totalHits.value);
Copy link
Member

@owaiskazi19 owaiskazi19 Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just have the below

assertTrue(numDocs >= topDocs.totalHits.value); 

rather the whole condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test, our total documents (numDocs) is not always over 1000, and so it is more precise to test for equality in the 1/3 of cases where a more accurate result will apply.

This will protect against a bug where totalHits is always some small constant like 1, which would pass the lower bound test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I might argue we could add another assertion that totalHits.value >= 1000 in this case, but that might be overkill.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's overkill...

@owaiskazi19
Copy link
Member

Pre-commit failing because of the usual Spotless

Execution failed for task ':server:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/index/fielddata/AbstractStringFieldDataTestCase.java
          @@ -343,9 +343,9 @@
           ········);
           ········//·As·of·Lucene·9.0.0,·totalHits·may·be·a·lower·bound
           ········if·(topDocs.totalHits.relation·==·TotalHits.Relation.EQUAL_TO)·{
          -············assertEquals(numDocs,·topDocs.totalHits.value);············
          +············assertEquals(numDocs,·topDocs.totalHits.value);
           ········}·else·{
          -············assertTrue(numDocs·>=·topDocs.totalHits.value);························
          +············assertTrue(numDocs·>=·topDocs.totalHits.value);
           ········}
           ········BytesRef·previousValue·=·first·?·null·:·reverse·?·UnicodeUtil.BIG_TERM·:·new·BytesRef();
           ········for·(int·i·=·0;·i·<·topDocs.scoreDocs.length;·++i)·{

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thx for adding this check @dbwiddis! I had completely forgot about this change.

@heemin32
Copy link
Contributor

Shouldn't we back port this PR to 2.x branch?

@dblock dblock added the backport 2.x Backport to 2.x branch label Oct 21, 2022
@dblock
Copy link
Member

dblock commented Oct 21, 2022

I labeled it to backport, let's see.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 21, 2022
…wer bound (#4270)

Fixes tests to account for TotalHits uncertainty as of Lucene 9.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
(cherry picked from commit 4643620)
andrross pushed a commit that referenced this pull request Oct 21, 2022
…or TotalHits lower bound (#4867)

* Fix AbstractStringFieldDataTestCase tests to account for TotalHits lower bound (#4270)

Fixes tests to account for TotalHits uncertainty as of Lucene 9.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
(cherry picked from commit 4643620)

* Added CHANGELOG.

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>

Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Co-authored-by: Daniel Widdis <widdis@gmail.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
@dblock
Copy link
Member

dblock commented Oct 21, 2022

@dbwiddis Looking at the implementation here: not in love with conditions in specs. When is it one or the other?

@dbwiddis
Copy link
Member Author

dbwiddis commented Oct 19, 2024

@dbwiddis Looking at the implementation here: not in love with conditions in specs. When is it one or the other?

@dblock answering nearly two years later, sorry. The initial PR comment javadoc explains the difference. Lucene can compute the hit count and if it's > 1000 it makes it an approximate bound, indicated by the enum used for the conditional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flakey test failure in SortedSetDVStringFieldDataTests testSortMissingLast
5 participants