-
Couldn't load subscription status.
- Fork 176
Add math udf #3390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add math udf #3390
Conversation
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
- Additionally implement user-defined ConvFunction Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Add math udf
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And rename UserDefineFunctionUtils to UserDefinedFunctionUtils to unify the naming style
| import org.apache.calcite.sql.type.SqlTypeName; | ||
| import org.opensearch.sql.calcite.CalcitePlanContext; | ||
| import org.opensearch.sql.calcite.udf.mathUDF.SqrtFunction; | ||
| import org.opensearch.sql.calcite.udf.mathUDF.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix this
| // Convert to the target base; BigInteger's toString(...) yields lowercase above 9 | ||
| String result = value.toString(toBase); | ||
|
|
||
| // Convert to uppercase to align with MySQL's behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why align with MySQL's behavior here? Have you checked the exists behavior in v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V2, it returns everything in lowercase. I'll fix it.
| case "LOG10": | ||
| return SqlStdOperatorTable.LOG10; | ||
| case "MOD": | ||
| return TransferUserDefinedFunction(ModFunction.class, "MOD", ReturnTypes.DOUBLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this and https://github.com/apache/calcite/blob/e5e7faeff5985bc1b2342144b2bd31ca8ea84d3a/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L1719?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MOD in OpenSearch supports MOD(DOUBLE, LONG), e.g. MOD(3.1, 2) = 1.1.
However, the MOD defined in Calcite supports only long parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment the explanation in code
| case "CEILING": | ||
| return SqlStdOperatorTable.CEIL; | ||
| case "CONV": | ||
| return TransferUserDefinedFunction(ConvFunction.class, "CONVERT", ReturnTypes.BIGINT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this and https://github.com/apache/calcite/blob/e5e7faeff5985bc1b2342144b2bd31ca8ea84d3a/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L1636
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are defined differently. According to its implementation (definition & underlying implementation), the CONVERT in MySQL is to convert between charsets like UTF-8 and LATIN1, whereas the CONV in OpenSearch is to convert between different base, e.g. binary to hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
| String expectedLogical = | ||
| "LogicalProject(ASIN=[$8])\n" | ||
| + " LogicalSort(fetch=[2])\n" | ||
| + " LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4]," | ||
| + " SAL=[$5], COMM=[$6], DEPTNO=[$7], ASIN=[ASIN(0)])\n" | ||
| + " LogicalTableScan(table=[[scott, EMP]])\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should verify the precise content rather than the entire query plan. For example, if LogicalSort does not appear in the query plan, all irrelevant unit tests must be updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, @yuancu please simply the UTs and complicate the ITs
| @Override | ||
| public Object eval(Object... args) { | ||
| if (args.length < 2) { | ||
| throw new IllegalArgumentException("At least two arguments are required"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the error message more precise. "Mod function required two arguments, but it has args.length"
|
|
||
| return num0 % num1; // Handles both float and double cases | ||
| } else { | ||
| throw new IllegalArgumentException("Invalid argument types: Expected numeric values"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the error message more precise.
| throw new ArithmeticException("Modulo by zero is not allowed"); | ||
| } | ||
|
|
||
| return num0 % num1; // Handles both float and double cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments from LLM? if so, remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is above "Modulo" a typo?
|
|
||
| import org.opensearch.sql.calcite.udf.UserDefinedFunction; | ||
|
|
||
| public class ModFunction implements UserDefinedFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider add UT?
| public class ConvFunction implements UserDefinedFunction { | ||
| @Override | ||
| public Object eval(Object... args) { | ||
| Object number = args[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not check args.length in ConvFunction? and check args.length in ModFunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And need some UT or IT for edge case
…uble.NaN and Float.NaN to null Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Handle Edge Cases & Update UT/ITs for Math Functions
|
I found following problems during testing:
|
Java result make sense to me, also it align with Spark-SQL. |
| return SqlLibraryOperators.LOG2; | ||
| case "LOG10": | ||
| return SqlStdOperatorTable.LOG10; | ||
| case "MOD": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to
case "MOD":
case "%":
....
|
@xinyual could you resolve the conflicts again? |
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- remove comparision between string and integers - correct thrown error types Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- add alias % to MOD - return negative when the dividend is negative Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Correct Math UDF Integration Tests
Already resolve them. |
|
@yuancu spotlessJavaCheck FAILED |
| loadIndex(Index.STATE_COUNTRY_WITH_NULL); | ||
| } | ||
|
|
||
| private static boolean containsMessage(Throwable throwable, String message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: use verifyErrorMessageContains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem in this IT is all the test cases are applying on literals instead of fields.
Could you add some IT for fields with
loadIndex(Index.DATA_TYPE_NUMERIC);
loadIndex(Index.DATA_TYPE_NONNUMERIC);
Signed-off-by: xinyual <xinyual@amazon.com>
Fixed now. |
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- additionally enrich math ITs with fields calculations Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Improve Math ITs & Correct MOD Function
|
Fix this please |
7f8cbfe
into
opensearch-project:feature/calcite-engine
|
LGTM, approved. Merging to dev branch. |
* add math udfs Signed-off-by: xinyual <xinyual@amazon.com> * add log argument Signed-off-by: xinyual <xinyual@amazon.com> * Add math function unit tests - Additionally implement user-defined ConvFunction Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add integration tests for Calcite math functions Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Rename CalcitePPLMathFunctionsIT to CalcitePPLBuiltinFunctionIT Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * add license Signed-off-by: xinyual <xinyual@amazon.com> * apply spot Signed-off-by: xinyual <xinyual@amazon.com> * Update the implementation of CONV function to align with v2's behavior - Rename UserDefineFunctionUtils to UserDefinedFunctionUtils Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Improve code style: - enforce uniform parameter number check - comment on differences from calcite's implementation if necessary Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Simplify Calcite PPL math function unit tests Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Alter MOD and SQRT UDF to conform to documented behaviors - return null with invalid (zero, negative) arguments - return wider type for mod Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Complicate math integration tests - edge cases for UDF - combine operations or clauses Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Handle NULL return in ASIN, ACOS, SQRT and POW by convert returned Double.NaN and Float.NaN to null Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Apply spotless on math UDFs and their tests Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove unnecessary Double cast in SQRT UDF Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Convert returned Double.NaN and Float.NaN from math UDFs to LITERAL_NULL Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct math UDF integration tests - remove comparision between string and integers - correct thrown error types Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Update MOD UDF - add alias % to MOD - return negative when the dividend is negative Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Modify substring ITs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * apply spot Signed-off-by: xinyual <xinyual@amazon.com> * Replace containsMessage with verifyErrorMessageContains in math ITs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct MOD return types - additionally enrich math ITs with fields calculations Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * fix UT Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: xinyual <xinyual@amazon.com> Co-authored-by: Yuanchun Shen <yuanchu@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
This PR add math built-in functions.
And also add corresponding IT for them.
Due to the lack of IT in spark PPL, we also implement some by ourselves.
The integration tests covered by Spark are copied to this IT (Apart from
floor). The rest is implemented by ourselves../gradlew :integ-test:integTest --tests 'CalciteIT' succeed in local
Related Issues
Resolves #3398
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.