-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Clean up how pipeline aggs check for multi-bucket #54161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes elastic#53215
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
@elasticmachine, run elasticsearch-ci/2 |
The bwc failures are legit. The fixes for these aggs need skips. The elasticsearch-ci/2 didn't look legit at first glance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rad, I like this. As a followup I wonder if we could give a similar treatment for "sequential" vs "non-sequential" aggs (e.g. date_histo has sequentially ordered buckets). Would allow us to cleanup that spot in pipeline validation where certain pipelines need sequential buckets.
(I think the merge conflicts are just little things, like removed generics, due to the VS merge. Should be easy enough to fix I hope)
* This might be a bit of a lie though because you can't really use | ||
* pipeline aggregations on composite aggregations. | ||
*/ | ||
return BucketCardinality.MANY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I wonder if we should set this to NONE
for the time being, to prevent people from accidentally using pipelines on composite? There's an issue flying around somewhere discussing it... but last time we chatted, we decided to punt on the decision. Some pipelines are theoretically safe (bucket_script, operating on the bucket itself), but others like derivative
are totally not kosher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Composite is a weird beast.
I think so! |
…c#54161) Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes elastic#53215
* upstream/master: (447 commits) Adjust BWC version on node roles being sorted Deprecate node local storage setting (elastic#54374) Remove Unused BaseNodeRequest (elastic#54323) Decouple environment from DiscoveryNode (elastic#54373) Ensure that the output of node roles are sorted (elastic#54376) Do not stash environment in security (elastic#54372) Do not stash environment in machine learning (elastic#54371) Clean up how pipeline aggs check for multi-bucket (elastic#54161) Remove toString from Painless user tree nodes (elastic#54117) Docs: Use splitOnToken instead of custom function (elastic#48408) bwc: enable script cache in node stats (elastic#54362) bwc: Mute for script cache in node stats (elastic#54359) Test to enforce response to invalid data stream names Scripting: Use randomValueOtherThan in stats test (elastic#54356) Move transport decoding and aggregation to server (elastic#48263) Mute ScriptServiceTests.testContextCacheStats (elastic#54352) Improve checkstyle performance and readability (elastic#54308) Disable Gradle daemon when executing Windows packaging tests (elastic#54316) [Docs] Minor fix for SubmitAsyncSearchRequest.keepOnCompletion javadoc (elastic#54325) [DOCS] Fix typos in top metrics agg docs (elastic#54299) ... Conflicts: server/src/main/java/org/elasticsearch/index/IndexModule.java server/src/main/java/org/elasticsearch/index/IndexService.java server/src/main/java/org/elasticsearch/indices/IndicesService.java server/src/test/java/org/elasticsearch/index/IndexModuleTests.java
`master` wasn't sending `auto_date_histogram`'s `minimumIntervalExpression` over the wire to `7.x` nodes even though everything above `7.3` expected it. We never noticed this because we didn't have any yml tests for `auto_date_histogram` until I added some in elastic#54161. Closes elastic#54396
#54379) Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes #53215
I've finished the backport but I haven't updated the branches on master to run the bwc tests properly. Before I do that I'm going to handle #54382 which I discovered while backporting. |
Now that elastic#54161 is backported we can run its tests against nodes that have the fix.
Now that #54161 is backported we can run its tests against nodes that have the fix. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Pipeline aggregations like
stats_bucket
,sum_bucket
, andpercentiles_bucket
only operate on buckets that have multiple buckets.This adds support for those aggregations to
geo_distance
,ip_range
,auto_date_histogram
, andrare_terms
.This all happened because we used a marker interface to mark compatible
aggs,
MultiBucketAggregationBuilder
and it was fairly easy to forgetto implement the interface.
This replaces the marker interface with an abstract method in
AggregationBuilder
,bucketCardinality
which makes you returnNONE
,ONE
, orMANY
. Thebucket
aggregations can check forMANY
. Atthis point
ONE
andNONE
amount to about the same thing, but Isuspect that'll be a useful distinction when validating bucket sorts.
Closes #53215