-
Notifications
You must be signed in to change notification settings - Fork 181
Support eventstats command with Calcite
#3585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lantao Jin <ltjin@amazon.com>
|
|
||
| private final UnresolvedExpression windowFunction; | ||
|
|
||
| private final List<UnresolvedExpression> windowFunctionList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add window boundary? For example, I want to execute a ppl equal to a sql like
select AVG(SUM(total_amount)) OVER (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3day (this is one of the moving avg, which we plan to support for t2visbuilder).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to support it. but not this PR of eventstats command. We can extend this syntax after PPL lang unified. e.g support boundary when we implement streamstats command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a WindowFrame for further using.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
| * ``eventstats``: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands. | ||
|
|
||
|
|
||
| Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Could we highlight version info? for instance, in title, evenstats (since 3.1.0). or maybe using version controled doc? Any thoughts? @dai-chen @qianheng-aws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
highlighted in description and docs/user/ppl/index.rst
|
|
||
| Example:: | ||
|
|
||
| PPL> source=accounts | eventstats sum(age) by gender; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an example explain how to handle missing value?
| if (BuiltinFunctionName.AVG == agg.get()) { | ||
|
|
||
| } else { | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| rows("John", "Canada", "Ontario", 4, 2023, 25, 4, 36.25, 20, 70), | ||
| rows("Jake", "USA", "California", 4, 2023, 70, 4, 36.25, 20, 70), | ||
| rows("Jane", "Canada", "Quebec", 4, 2023, 20, 4, 36.25, 20, 70), | ||
| rows("Hello", "USA", "New York", 4, 2023, 30, 4, 36.25, 20, 70)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in scheam, 2nd column is age, but in result, 2nd column is country?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data order is the actually schema order. Let me change to verifySchemaInOrder
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
|
@dai-chen please take another look. |
|
|
||
| static RexNode makeOver( | ||
| CalcitePlanContext context, | ||
| String name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] Why not just pass in BuiltinFunctionName here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| RexWindowBound lowerBound = convert(context, windowFrame.getLower()); | ||
| RexWindowBound upperBound = convert(context, windowFrame.getUpper()); | ||
| switch (functionName) { | ||
| // There is no "avg" AggImplementor in Calcite, we have to change avg window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a Calcite rule AggregateReduceFunctionsRule, it will do such transformation of rewriting avg to sum/count. Could we just use avg here and leverage that rule for transformation? It should introduce more refined handling for cases like NULL values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for informing this, let me check how to apply this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking AggregateReduceFunctionsRule code, seems it could be applied by logical Aggregate, rather than logical Window. The converting made by SqlNode level (parser). We can open a follow-up ticket about self-dev rule for window.
| } | ||
| } | ||
|
|
||
| public void pushWindowPartitions(List<RexNode> partition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems never used for these 3 APIs. And why should the window related context stored in this global context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will delete them. It is useless code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| .map( | ||
| groupCtx -> | ||
| (UnresolvedExpression) | ||
| new Alias( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Why do we always need to wrap byClause with Alias here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a followup and it needs much code refactor for stats, not sure the original reason.
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
* Support eventstats command with Calcite Signed-off-by: Lantao Jin <ltjin@amazon.com> * add doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * Support variance functions Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix UT Signed-off-by: Lantao Jin <ltjin@amazon.com> * update doc Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
Support
eventstatscommand with CalciteThe
eventstatscommand actually equals to window function in SQLequals to
And the default window frame is
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGIn this PR, the supported window functions are
We will support more window functions in follow-up PRs since they are not supported in PPL-Spark
Related Issues
Resolves #3563
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.