-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Save a little space in agg tree #53730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove *all* pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 *need* those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node *will* send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments mainly around testing. Think this looks good! Serialization changes always give me the cold sweats but I can't see any flaws currently with the approach.
@@ -115,6 +115,39 @@ | |||
- match: { aggregations.cluster.buckets.0.key: "local_cluster" } | |||
- match: { aggregations.cluster.buckets.0.doc_count: 5 } | |||
|
|||
# once more, this time with a pipeline agg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should we maybe split this out into it's own test? Just thinking that long multi-step yaml tests can be tricky to debug sometimes.
Haven't looked at the rest of the tests in this yaml though, so it might not be easy to move the indexing (or whatever else) steps up to the setup though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it, sure! They can get hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - this is a test that doesn't clear indices after it runs. So moving things around is a bit more complex than we'd like to be honest. I can do it, but I think it should wait for a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, no worries then. not worth re-arranging everything for 👍
* Setting the pipeline tree source to null is here is correct but | ||
* only because we don't immediately pass the InternalAggregations | ||
* off to another node. Instead, we always reduce together with | ||
* many aggregations and that always adds the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the comment trails off without finishing it's thought :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I do that somet
@@ -115,6 +115,39 @@ | |||
- match: { aggregations.cluster.buckets.0.key: "local_cluster" } | |||
- match: { aggregations.cluster.buckets.0.doc_count: 5 } | |||
|
|||
# once more, this time with a pipeline agg | |||
- do: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a similar test for the rolling-upgrade
module? Theoretically it should be the same as CCS, but it might also smoke out different issues due to heterogeneous serialization inside the same cluster (instead of funneling through a gateway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to have a "mixed cluster CCS" test. I talked that one through with @javanna and we don't have one now and probably don't want to build one just for this.
terms: | ||
field: f1.keyword | ||
aggs: | ||
s: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a non-top-level pipeline agg just to confirm they aren't affected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured I'd get it in my next PR about non-top-level pipeline aggs, but I'm happy to do it now!
@polyfractal, I think this is ready for another round! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks @polyfractal ! |
I've updated all of the version numbers in the description to 7.8.0 because this is not making the 7.7.0 release train. |
This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove *all* pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 *need* those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node *will* send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.
This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove *all* pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 *need* those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node *will* send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.
Backport done! I'm going to leave the |
Updates a few versions in serialization because we didn't make the 7.7.0 release train.
This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in elastic#53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in elastic#53730 to read the pipelines from the
This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the
This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in elastic#53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in elastic#53730 to read the pipelines from the
This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the
This drop the "top level" pipeline aggregators from the aggregation
result tree which should save a little memory and a few serialization
bytes. Perhaps more imporantly, this provides a mechanism by which we
can remove all pipelines from the aggregation result tree. This will
save quite a bit of space when pipelines are deep in the tree.
Sadly, doing this isn't simple because of backwards compatibility. Nodes
before 7.8.0 need those pipelines. We provide them by setting passing
a
Supplier<PipelineTree>
into the root of the aggregation tree that weonly call if we need to serialize to a version before 7.8.0.
This solution works for cross cluster search because we always reduce
the aggregations in each remote cluster and then forward them back to
the coordinating node. Its quite possible that the coordinating node
needs the pipeline (say it is version 7.1.0) and the gateway node in the
remote cluster doesn't (version 7.8.0). In that case the data nodes
won't send the pipeline aggregations back to the gateway node.
Critically, the gateway node will send the pipeline aggregations back
to the coordinating node. This is all managed with that
Supplier<PipelineTree>
, but how it is managed is a bit tricky.