Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document aggregation code generation #121644

Merged
merged 9 commits into from
Feb 11, 2025
Next Next commit
Document aggregation code generation
  • Loading branch information
idegtiarenko committed Feb 4, 2025
commit b3a89d7a1526046cea1309827464061db078a7f3
ivancea marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,53 @@
*
* <h3>Creating aggregators for your function</h3>
* <p>
* Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc.
* Aggregators contain the core logic of how to combine values, what to store, how to process data, etc.
* Currently, we rely on code generation (per aggregation per type) in order to implement such functionality.
* This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types).
* As a result we could not rely on interfaces implementation and generics.
* </p>
* <p>
* In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator").
ivancea marked this conversation as resolved.
Show resolved Hide resolved
* Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator}
* The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets.
* </p>
* <p>
* Before you start implementing it, please note that:
* <ul>
* <li>All methods must be public static</li>
* <li>
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically
* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT).
* </li>
* <li>TBD explain {@code IntermediateState}</li>
* <li>TBD explain special internal state `seen`</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the warnings feature, there's also the "failed" state. Identical to seen. Never used in main though, only here: https://github.com/elastic/elasticsearch/pull/116170/files#diff-8a408014887a6dc87eed1f71346536fc77636245d5411714b6ba2cf265812538R18

Maybe worth mentioning, with the warnExceptions attribute

* </ul>
* </p>
* <p>
* Aggregation expects:
* <ul>
* <li>type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.AggregatorState}</li>
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
* <li>{@code SS init()} or {@code SS initSingle()} returns empty initialized aggregation state</li>
* <li>{@code void combine(SS state, I input)} or {@code SS combine(SS state, I input)} adds input entry to the aggregation state</li>
* <li>{@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state to the current aggregation state (used to combine results across different nodes)</li>
* <li>{@code Block evaluateFinal(SS state, BigArrays? DriverContext?)} converts the inner state of the aggregation to the result column</li>
* </ul>
* </p>
* <p>
* Grouping aggregation expects:
* <ul>
* <li>type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}</li>
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean:

Suggested change
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
* <li>type T (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>

From the following comments.

* <li>{@code GS init()} or {@code GS initGrouping()} returns empty initialized grouping aggregation state</li>
* <li>{@code void combine(GS state, int groupId, T input)} adds input entry to the corresponding group (bucket) of the grouping aggregation state</li>
* <li>{@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped aggregation state into the first one</li>
* <li>{@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state to the current grouped aggregation state (used to combine results across different nodes)</li>
* <li>{@code Block evaluateFinal(GS state, IntVectorSelected, BigArrays? DriverContext?)} converts the inner state of the grouping aggregation to the result column</li>
* </ul>
* </p>
* <p>
*
* </p>
* <ol>
* <li>
Expand Down