-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Remove ordinal grouping operator #131133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ordinal grouping operator #131133
Conversation
68ab79c
to
585aed2
Compare
585aed2
to
24878f6
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
@@ -130,6 +130,7 @@ default void add(int positionOffset, IntBlock groupIds) { | |||
|
|||
/** | |||
* Add the position-th row from the intermediate output of the given aggregator function to the groupId | |||
* TODO: Remove this method as the grouping operator has been removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this method in a follow-up.
I don't really see a performance hit at all in the nyc_taxis. Do you? It's only the |
Yes, that's correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM presuming I'm right about the perf changes comment I just made.
I'm super thankful this is only part of the LocalPhysicalPlanOptimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| STATS max(@timestamp), count(*) BY message | ||
| SORT message NULLS FIRST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordinal grouping operator emits output sorted by the keys, while the hash operator outputs keys in arrival order.
The ordinals grouping operator was introduced to speed up aggregation before ordinal blocks and related optimizations in block hashes were available. However, this operator has several issues:
keyword
type and requiresdoc_values
.VALUES
aggregation using this operator (see Avoid O(N^2) in VALUES with ordinals grouping #130576).Although the ordinals grouping operator is slightly faster than the hash operator with ordinal blocks, its complexity now outweighs the benefits. This PR proposes removing the operator. Below is the NYC_taxis benchmark.
Closes #98963