Skip to content

Add "none" gap_policy to pipeline aggs #44516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

polyfractal
Copy link
Contributor

@polyfractal polyfractal commented Jul 17, 2019

WIP

This adds a "none" gap policy to pipeline aggregations, which is effectively a policy which does nothing to gaps. Missing/empty buckets are passed to the aggregation without any transformation. Some aggs can't handle null/missing values (derivative, etc) and so they will not emit any values for the affected buckets. Other aggs (like bucket_script) are executed and given the bucket value (null, NaN, etc) for the user to decide what to do with it.

This also adds a doc_count which is accessible from pipeline scripts, so that the user can differentiate between null-from-missing and null-from-document.

This depends on #44179 for some changes to the test framework. MovingFn is currently broken because the queue doesn't support null values, but this should be alleviated once #44360 merges.

Closes: #27377, #42281

BucketScript was using the old-style parser and could easily be
converted over to the newer static parser.

Also adds a test for GapPolicy enum serialization
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
This adds a new gap policy, which does nothing.  Gaps are presented
as-is to the aggregator and up to the agg to decide what to do.

Some aggs cannot handle gaps (derivatives, etc) and will treat the
empty bucket as if it were missing.  Other aggs (bucket_script, moving
fn) will pass the empty bucket to the user for evaluation.

This also adds a `params.doc_count` property which is accessible from
pipeline scripts, allowing the user to differentiate between null
values and empty buckets.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal polyfractal removed the WIP label Aug 5, 2019
@colings86 colings86 added v7.5.0 and removed v7.4.0 labels Aug 30, 2019
@jimczi jimczi added v7.6.0 and removed v7.5.0 labels Nov 12, 2019
@ronid
Copy link

ronid commented Jan 1, 2020

Any news? Can't wait to test this and see if my exception is fixed 🙏

@polyfractal polyfractal added v7.7.0 and removed v7.6.0 labels Jan 15, 2020
@bpintea bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020
@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bucket script aggregation returns invalid value for missing docs