[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level #8860

jed326 · 2023-07-25T00:40:02Z

Subtask for #7357 to focus on the test failures related to shard size parameter:

REPRODUCE WITH: ./gradlew 'null' --tests "org.opensearch.search.aggregations.bucket.TermsShardMinDocCountIT.testShardMinDocCountSignificantTermsTest" -Dtests.seed=A749B9CB2092798B -Dtests.locale=en-IE -Dtests.timezone=Europe/Madrid -Druntime.java=17

java.lang.AssertionError: 
Expected: <0>
     but: was <1>
Expected :<0>
Actual   :<1>
<Click to see difference>


	at __randomizedtesting.SeedInfo.seed([A749B9CB2092798B:D251D3457B327719]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.search.aggregations.bucket.TermsShardMinDocCountIT.testShardMinDocCountSignificantTermsTest(TermsShardMinDocCountIT.java:103)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)

REPRODUCE WITH: ./gradlew 'null' --tests "org.opensearch.search.aggregations.bucket.TermsShardMinDocCountIT.testShardMinDocCountTermsTest" -Dtests.seed=A749B9CB2092798B -Dtests.locale=en-IE -Dtests.timezone=Europe/Madrid -Druntime.java=17

java.lang.AssertionError: 
Expected: <0>
     but: was <2>
Expected :<0>
Actual   :<2>
<Click to see difference>


	at __randomizedtesting.SeedInfo.seed([A749B9CB2092798B:BD5353BCAD8763C8]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.search.aggregations.bucket.TermsShardMinDocCountIT.testShardMinDocCountTermsTest(TermsShardMinDocCountIT.java:169)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	a

 - org.opensearch.search.aggregations.bucket.ShardSizeTermsIT.testShardSizeEqualsSizeDouble
 - org.opensearch.search.aggregations.bucket.ShardSizeTermsIT.testShardSizeEqualsSizeString

The text was updated successfully, but these errors were encountered:

jed326 · 2023-08-01T16:37:38Z

shard_size and shard_min_doc_count are being evaluated at the slice level which is leading to unexpected results in the tests causing failures.

High Level Overview:

buildAggregations creates a priority queue with size min(buckets, shard_size)

OpenSearch/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/NumericTermsAggregator.java

Lines 182 to 183 in 8eea7b9

    
           int size = (int) Math.min(bucketsInOrd, bucketCountThresholds.getShardSize()); 
        
           PriorityQueue<B> ordered = buildPriorityQueue(size);

This is problem 1 -- shard_size is being evaluated at the slice level. This manifests in 2 ways. 1) the shards can return up to shard_size * slice_count buckets to the coordinator, so shard_size is not actually being enforced and 2) the doc_count of collected buckets may be inaccurate because the same term may exist on multiple slices but due to the shard_size limitation the same term bucket is not collected on all slices.

Iterates through buckets -- If docCount >= shard_min_doc_count for the bucket, then add the bucket to the priority queue
- This is problem 2 -- for concurrent search we are checking the shard_min_doc_count at the slice level. Across slices docCount may add up to be >= shard_min_doc_count, however if that's not true for a given slice then a bucket is not created for the term.
Priority queue will drop items from it as new ones are added, ensuring we only end up with min(buckets, shard_size) buckets
At the coordinator level, we do not consume any buckets with less than min_doc_count docs:

OpenSearch/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalTerms.java

Line 458 in e354201

if (bucket.getDocCount() >= minDocCount) {

High Level Solutions:

I have a few high level solutions in mind. The big problem is with shard_size. We should have some sort of limit on the number of buckets a slice can collect, but if we ignore shard_size at the slice level then it could grow unbounded. However, introducing any sort of slice level limit on bucket count (let's call this slice_size for now), can result in behavioral changes between concurrent and non-concurrent search use cases if the number of ordinals >> the slice_size. Specifically the doc_count could be inaccurate and missing some docs from some slices. This is analogous to how shard_size behaves in the 1 shard vs multiple shard case. This is true even if we introduce slice level settings, which I don't believe we should do because the user doesn't really have any control or knowledge over how documents are distributed over slices so these controls don't really mean anything to them.

Essentially there are a few different ways we can address both shard_min_doc_count and shard_size and we can combine the solutions in different ways.

shard_min_doc_count:

No changes
Ignore at the slice level
Ignore at the slice level and apply at shard reduce
Introduce new slice level setting + [3]

shard_size:

No changes
Ignore at the slice level
Instead of applying shard_size at the slice level, we use some default higher than shard_size. For example, shard_size defaults to 1.2*size + 10.
Ignore at the slice level, apply at the shard level reduce. This requires using the same comparitor for ordering purposes.
Introduce new slice level setting + [4]

Currently my recommendation is [3] for shard_min_doc_count and [3] for shard_size. I'm split on if we should reduce the number of buckets down to shard_size at the shard level reduce. The intention of shard_size is to give users a way to get more accurate results, so with solution [3] for shard_size I don't see a purpose in doing the reduce here.

jed326 · 2023-08-01T17:26:12Z

@sohami @reta @andrross Would like to get your input on this. It seems difficult to completely preserve the behavior between concurrent and non-concurrent search cases here.

sohami · 2023-08-01T19:23:26Z

@jed326 Thanks for the analysis.

The big problem is with shard_size. We should have some sort of limit on the number of buckets a slice can collect, but if we ignore shard_size at the slice level then it could grow unbounded

I didn't understand what you mean my it could grow unbounded. For sequential case as well, the shard_size limit is applied after document collection is done on the shards and before returning the output (i.e. in buildAggregation). So if we ignore the shard_size parameter at slice level but apply it before sending it to the coordinator then still the same bounds can be applied.

I think for both the shard level parameter if we ignore these at slice level and then apply it as part of reduce, it should work as expected. What are the challenges for shard_size option 4 ?

jed326 · 2023-08-01T19:56:56Z

I didn't understand what you mean my it could grow unbounded. For sequential case as well, the shard_size limit is applied after document collection is done on the shards and before returning the output (i.e. in buildAggregation).

You are right that at the shard level the same bounds would still be applied, so from the coordinator perspective concurrent and non-concurrent search would look the same. However, the unbounded growth I am concerned about here is basically the size of the priority queue at the slice level, which is also the number of buckets being returned at the slice level. Today this is bounded by the shard_size parameter, but if we ignore that then each slice would return a bucket for every single ordinal, which looks like a serious scaling issue to me. I think this is analogous to why shard_size parameter exists at all instead of just returning every single bucket to the coordinator and doing the size bound there.

jed326 added :test Adding or fixing a test distributed framework labels Jul 25, 2023

jed326 self-assigned this Jul 25, 2023

github-actions bot added the untriaged label Jul 25, 2023

jed326 mentioned this issue Jul 25, 2023

[Concurrent Segment Search] Fix SignificantTerms related ITs #8509

Closed

jed326 changed the title ~~[Concurrent Segment Search] Fix TermsShardMinDocCountIT related tests~~ [Concurrent Segment Search] Fix shard size parameter related tests Jul 25, 2023

sohami mentioned this issue Jul 25, 2023

[Concurrent Segment Search] IT tests failures with concurrent search enabled #7357

Closed

jed326 changed the title ~~[Concurrent Segment Search] Fix shard size parameter related tests~~ [Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level Aug 1, 2023

jed326 mentioned this issue Aug 3, 2023

Do not evaluate shard_size and shard_min_doc_count at segment slice level #9085

Merged

6 tasks

anasalkouz removed the untriaged label Aug 8, 2023

reta closed this as completed in #9085 Aug 9, 2023

reta added the v2.10.0 label Aug 9, 2023

jed326 added the Search:Aggregations label Aug 9, 2023

This was referenced Aug 9, 2023

[Backport 2.x] Do not evaluate shard_size and shard_min_doc_count at segment slice level #9211

Merged

docCountError must be 0 for shard level reduce #9246

Merged

jed326 mentioned this issue Nov 3, 2023

[Concurrent Segment Search] suppport Diversified Sampler and Sampler aggs #11075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level #8860

[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level #8860

jed326 commented Jul 25, 2023 •

edited

Loading

jed326 commented Aug 1, 2023 •

edited

Loading

jed326 commented Aug 1, 2023

sohami commented Aug 1, 2023

jed326 commented Aug 1, 2023 •

edited

Loading

[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level #8860

[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level #8860

Comments

jed326 commented Jul 25, 2023 • edited Loading

jed326 commented Aug 1, 2023 • edited Loading

High Level Overview:

High Level Solutions:

jed326 commented Aug 1, 2023

sohami commented Aug 1, 2023

jed326 commented Aug 1, 2023 • edited Loading

jed326 commented Jul 25, 2023 •

edited

Loading

jed326 commented Aug 1, 2023 •

edited

Loading

jed326 commented Aug 1, 2023 •

edited

Loading