Skip to content

Clean up how pipeline aggs check for multi-bucket #54161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 28, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 25, 2020

Pipeline aggregations like stats_bucket, sum_bucket, and
percentiles_bucket only operate on buckets that have multiple buckets.
This adds support for those aggregations to geo_distance, ip_range,
auto_date_histogram, and rare_terms.

This all happened because we used a marker interface to mark compatible
aggs, MultiBucketAggregationBuilder and it was fairly easy to forget
to implement the interface.

This replaces the marker interface with an abstract method in
AggregationBuilder, bucketCardinality which makes you return NONE,
ONE, or MANY. The bucket aggregations can check for MANY. At
this point ONE and NONE amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.

Closes #53215

Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.

This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.

This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.

Closes elastic#53215
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@bpintea bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020
@nik9000
Copy link
Member Author

nik9000 commented Mar 25, 2020

@elasticmachine, run elasticsearch-ci/2

@nik9000
Copy link
Member Author

nik9000 commented Mar 25, 2020

The bwc failures are legit. The fixes for these aggs need skips. The elasticsearch-ci/2 didn't look legit at first glance.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad, I like this. As a followup I wonder if we could give a similar treatment for "sequential" vs "non-sequential" aggs (e.g. date_histo has sequentially ordered buckets). Would allow us to cleanup that spot in pipeline validation where certain pipelines need sequential buckets.

(I think the merge conflicts are just little things, like removed generics, due to the VS merge. Should be easy enough to fix I hope)

* This might be a bit of a lie though because you can't really use
* pipeline aggregations on composite aggregations.
*/
return BucketCardinality.MANY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I wonder if we should set this to NONE for the time being, to prevent people from accidentally using pipelines on composite? There's an issue flying around somewhere discussing it... but last time we chatted, we decided to punt on the decision. Some pipelines are theoretically safe (bucket_script, operating on the bucket itself), but others like derivative are totally not kosher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Composite is a weird beast.

@nik9000
Copy link
Member Author

nik9000 commented Mar 27, 2020

Rad, I like this. As a followup I wonder if we could give a similar treatment for "sequential" vs "non-sequential" aggs (e.g. date_histo has sequentially ordered buckets). Would allow us to cleanup that spot in pipeline validation where certain pipelines need sequential buckets.

I think so!

@nik9000 nik9000 merged commit a0f7c4a into elastic:master Mar 28, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 28, 2020
…c#54161)

Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.

This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.

This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.

Closes elastic#53215
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Mar 30, 2020
* upstream/master: (447 commits)
  Adjust BWC version on node roles being sorted
  Deprecate node local storage setting (elastic#54374)
  Remove Unused BaseNodeRequest (elastic#54323)
  Decouple environment from DiscoveryNode (elastic#54373)
  Ensure that the output of node roles are sorted (elastic#54376)
  Do not stash environment in security (elastic#54372)
  Do not stash environment in machine learning (elastic#54371)
  Clean up how pipeline aggs check for multi-bucket (elastic#54161)
  Remove toString from Painless user tree nodes (elastic#54117)
  Docs: Use splitOnToken instead of custom function (elastic#48408)
  bwc: enable script cache in node stats (elastic#54362)
  bwc: Mute for script cache in node stats (elastic#54359)
  Test to enforce response to invalid data stream names
  Scripting: Use randomValueOtherThan in stats test (elastic#54356)
  Move transport decoding and aggregation to server (elastic#48263)
  Mute ScriptServiceTests.testContextCacheStats (elastic#54352)
  Improve checkstyle performance and readability (elastic#54308)
  Disable Gradle daemon when executing Windows packaging tests (elastic#54316)
  [Docs] Minor fix for SubmitAsyncSearchRequest.keepOnCompletion javadoc (elastic#54325)
  [DOCS] Fix typos in top metrics agg docs (elastic#54299)
  ...

Conflicts:
	server/src/main/java/org/elasticsearch/index/IndexModule.java
	server/src/main/java/org/elasticsearch/index/IndexService.java
	server/src/main/java/org/elasticsearch/indices/IndicesService.java
	server/src/test/java/org/elasticsearch/index/IndexModuleTests.java
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 30, 2020
`master` wasn't sending `auto_date_histogram`'s
`minimumIntervalExpression` over the wire to `7.x` nodes even though
everything above `7.3` expected it. We never noticed this because we
didn't have any yml tests for `auto_date_histogram` until I added some
in elastic#54161.

Closes elastic#54396
nik9000 added a commit that referenced this pull request Mar 30, 2020
#54379)

Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.

This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.

This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.

Closes #53215
nik9000 added a commit that referenced this pull request Mar 30, 2020
`master` wasn't sending `auto_date_histogram`'s
`minimumIntervalExpression` over the wire to `7.x` nodes even though
everything above `7.3` expected it. We never noticed this because we
didn't have any yml tests for `auto_date_histogram` until I added some
in #54161.

Closes #54396
@nik9000
Copy link
Member Author

nik9000 commented Mar 30, 2020

I've finished the backport but I haven't updated the branches on master to run the bwc tests properly. Before I do that I'm going to handle #54382 which I discovered while backporting.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 30, 2020
Now that elastic#54161 is backported we can run its tests against nodes that
have the fix.
nik9000 added a commit that referenced this pull request Mar 31, 2020
Now that #54161 is backported we can run its tests against nodes that
have the fix.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some multi-bucket aggs cannot be used in pipeline aggs
5 participants