-
Notifications
You must be signed in to change notification settings - Fork 176
Support bin
command with Calcite
#3878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I haven't read through the implementations yet, but an initial attempt with the I tested it on the accounts index, where the max age is 40, and the minimum age is 20. The query
From the plan, the new
I guess you did not calculate the max and min, but directly using 1000 and 0? |
The PR description is very solid and in detail. It would be better if you could create an issue or link it to an existing issue for this PR. |
The But the start is 30, there shouldn't be 25 and 20 in the result. The logical plan it gives contains no start and end information either:
By the way, for values below the start, should they be converted to the start value or be kept as is? |
integ-test/src/test/java/org/opensearch/sql/calcite/big5/CalciteBinCommandBig5IT.java
Outdated
Show resolved
Hide resolved
Could you also add tests in |
Thanks for the reply, I have updated our implementation for |
I updated the implementation. Our current implementation now correctly calculates min(age)=20 and max(age)=40 dynamically, and importantly, now bins=4 doesn't create exactly 4 bins but rather creates at most 4 bins using "nice number" widths - so bins=4 for age 20-40 actually produces 3 bins (20-30, 30-40, 40-50) with width=10 instead of 4 bins with width=5, because now it prioritizes human-readable bin boundaries over exact bin counts. Also, we now modify the original age field in-place instead of creating a separate age_bin field |
Could you please give a precise definition of nice number in the documentation?
|
loadIndex(Index.BANK); | ||
loadIndex(Index.DATE_FORMATS); | ||
loadIndex(Index.WEBLOG); | ||
loadIndex(Index.TIME_TEST_DATA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not necessary to load this in ExplainIT since you did not use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to CalciteExplainIT
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
|
||
protected final UnresolvedExpression field; | ||
|
||
@Nullable protected final String alias; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okey but I don't suggest to use @Nullable
annotation. Instead, I suggest to use Optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use Optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin Using optional as a field type will cause a warning, e.g. 'Optional<String>' used as type for field 'name'
. It seems that using Optional type as field / parameter types is discouraged? Java
'Optional' used as field or parameter type - JetBrains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yuancu. For now, there is no harmful to use Optional
as type of field 'name' because the UnresolvedPlan
itself is not serializable.
if (node.getAlias() != null) { | ||
handleBinWithAlias(context, binExpression, node.getAlias()); | ||
} else { | ||
handleBinWithoutAlias(context, binExpression, fieldName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem but what a Claude-style refactor! These are so straightforward codes, not need to separate to small methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
/** | ||
* Handles bin operation with alias by adding the binned field as a new column. | ||
* | ||
* @param context the Calcite plan context | ||
* @param binExpression the bin expression to be aliased | ||
* @param aliasName the alias name for the new column | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's don't put comments especially for private method when the method name + parameter names explains enough. (It would just become a noise when method name is explicit enough)
+1
/** | ||
* Handles bin operation with alias by adding the binned field as a new column. | ||
* | ||
* @param context the Calcite plan context | ||
* @param binExpression the bin expression to be aliased | ||
* @param aliasName the alias name for the new column | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the Javadoc, at least the L498~L500, there are noise code IMO.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteBinCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/ppl/BinCommandIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kai Huang <ahkcs@amazon.com>
core/src/main/java/org/opensearch/sql/calcite/utils/binning/SpanParser.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/utils/binning/SpanParser.java
Show resolved
Hide resolved
@qianheng-aws please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the size and complexity of this PR, it's challenging for me to thoroughly review all the changes and verify every logical detail. I'll provide a preliminary approval so my change request doesn't block progress, though please count this as a half an approval only.
I recommend obtaining approval from all contributors who have been heavily involved in reviewing this PR before merging. @ykmr1224 @qianheng-aws @LantaoJin @Swiddis
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-3878-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 71076f79cf6f7d365408766d5de77b125e57c9b5
# Push it to GitHub
git push --set-upstream origin backport/backport-3878-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev Then, create a pull request where the |
Thanks for the review and the preliminary approval. After discussing with the team, we’ve decided to go ahead and merge this PR first, and then address any additional concerns in follow-up issues. |
Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com> (cherry picked from commit 71076f7)
Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com> (cherry picked from commit 71076f7)
…4171) * Support ```bin``` command with Calcite (#3878) Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com> (cherry picked from commit 71076f7) * Fix Java version incompatibility Signed-off-by: Kai Huang <ahkcs@amazon.com> * remove unrelated tests Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>
Summary
This PR implements a complete separation of the
bin
command'sspan
functionality from aggregationspan
logic, ensuring backward compatibility while introducing robust SPL-compatible binning behavior. This includes support for parameters such asspan
,bins
,minspan
,start
,end
, andaligntime
.Related Issues
Resolves #3876
Test Results
Tests were run on the following real datasets:
accounts.json
: 1000 records,age
values 20–40,balance
values ~1000–50000time_test_data.json
: 100 records, timestamps from 2025-07-28 to 2025-08-011.
span
Parameter (Numeric)Query:
Result:
2.
bins
ParameterQuery:
Result:
3.
minspan
ParameterQuery:
Result:
4.
start
andend
ParametersQuery:
Result:
Logical Plan:
5.
aligntime="@d"
ParameterQuery:
Result:
Bin Performance Test