Skip to content

Conversation

@xinyual
Copy link
Contributor

@xinyual xinyual commented Mar 6, 2025

Description

This PR add math built-in functions.
And also add corresponding IT for them.
Due to the lack of IT in spark PPL, we also implement some by ourselves.

Functions Covered by Spark IT Covered by IT
abs Y Y
acos   Y
asin   Y
atan   Y
atan2   Y
cbrt   Y
ceiling Y Y
conv   Y
cos   Y
cot   Y
crc32   Y
degrees   Y
e   Y
exp   Y
floor Y Y (Modified)
ln Y Y
log   Y
log2   Y
log10   Y
mod Y Y
pi   Y
pow Y Y
radians   Y
rand   Y
round   Y
sign   Y
sin   Y
sqrt Y Y

The integration tests covered by Spark are copied to this IT (Apart from floor). The rest is implemented by ourselves.
./gradlew :integ-test:integTest --tests 'CalciteIT' succeed in local

Related Issues

Resolves #3398

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

xinyual and others added 6 commits March 5, 2025 10:29
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
- Additionally implement user-defined ConvFunction

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
xinyual added 2 commits March 6, 2025 19:49
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
@LantaoJin LantaoJin added the calcite calcite migration releated label Mar 7, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And rename UserDefineFunctionUtils to UserDefinedFunctionUtils to unify the naming style

import org.apache.calcite.sql.type.SqlTypeName;
import org.opensearch.sql.calcite.CalcitePlanContext;
import org.opensearch.sql.calcite.udf.mathUDF.SqrtFunction;
import org.opensearch.sql.calcite.udf.mathUDF.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this

// Convert to the target base; BigInteger's toString(...) yields lowercase above 9
String result = value.toString(toBase);

// Convert to uppercase to align with MySQL's behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why align with MySQL's behavior here? Have you checked the exists behavior in v2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V2, it returns everything in lowercase. I'll fix it.

case "LOG10":
return SqlStdOperatorTable.LOG10;
case "MOD":
return TransferUserDefinedFunction(ModFunction.class, "MOD", ReturnTypes.DOUBLE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@yuancu yuancu Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MOD in OpenSearch supports MOD(DOUBLE, LONG), e.g. MOD(3.1, 2) = 1.1.

However, the MOD defined in Calcite supports only long parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the explanation in code

case "CEILING":
return SqlStdOperatorTable.CEIL;
case "CONV":
return TransferUserDefinedFunction(ConvFunction.class, "CONVERT", ReturnTypes.BIGINT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@yuancu yuancu Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are defined differently. According to its implementation (definition & underlying implementation), the CONVERT in MySQL is to convert between charsets like UTF-8 and LATIN1, whereas the CONV in OpenSearch is to convert between different base, e.g. binary to hex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment on lines 67 to 72
String expectedLogical =
"LogicalProject(ASIN=[$8])\n"
+ " LogicalSort(fetch=[2])\n"
+ " LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],"
+ " SAL=[$5], COMM=[$6], DEPTNO=[$7], ASIN=[ASIN(0)])\n"
+ " LogicalTableScan(table=[[scott, EMP]])\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify the precise content rather than the entire query plan. For example, if LogicalSort does not appear in the query plan, all irrelevant unit tests must be updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, @yuancu please simply the UTs and complicate the ITs

@Override
public Object eval(Object... args) {
if (args.length < 2) {
throw new IllegalArgumentException("At least two arguments are required");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the error message more precise. "Mod function required two arguments, but it has args.length"


return num0 % num1; // Handles both float and double cases
} else {
throw new IllegalArgumentException("Invalid argument types: Expected numeric values");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the error message more precise.

throw new ArithmeticException("Modulo by zero is not allowed");
}

return num0 % num1; // Handles both float and double cases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments from LLM? if so, remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is above "Modulo" a typo?


import org.opensearch.sql.calcite.udf.UserDefinedFunction;

public class ModFunction implements UserDefinedFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider add UT?

public class ConvFunction implements UserDefinedFunction {
@Override
public Object eval(Object... args) {
Object number = args[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not check args.length in ConvFunction? and check args.length in ModFunction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And need some UT or IT for edge case

yuancu and others added 4 commits March 11, 2025 09:58
…uble.NaN and Float.NaN to null

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Handle Edge Cases & Update UT/ITs for Math Functions
@yuancu
Copy link
Collaborator

yuancu commented Mar 11, 2025

I found following problems during testing:

  • MOD function handles negative values inconsistently in V2
    MOD(-3, 2) = 1, while MOD(-3.1, 2) = -1.1.
    In contrast, -3 % 2 = -1, -3.1 % 2 = -1.1 in Java; -3 % 2 = 1, -3.1 % 2 = 0.9 in Python.
    The submitted implementation in this PR always returns positive values as that in Python.
  • ROUND is documented to return different types based on its arguments. But in V2 it always return double-typed values (which makes more sense IMO). Maybe we should consider update/correct the document?

@penghuo
Copy link
Collaborator

penghuo commented Mar 13, 2025

MOD function handles negative values inconsistently in V2
MOD(-3, 2) = 1, while MOD(-3.1, 2) = -1.1.
In contrast, -3 % 2 = -1, -3.1 % 2 = -1.1 in Java; -3 % 2 = 1, -3.1 % 2 = 0.9 in Python.
The submitted implementation in this PR always returns positive values as that in Python.

Java result make sense to me, also it align with Spark-SQL.

spark-sql (default)> select MOD(-3, 2), MOD(-3.1, 2);
mod(-3, 2)	mod(-3.1, 2)
-1	-1.1

return SqlLibraryOperators.LOG2;
case "LOG10":
return SqlStdOperatorTable.LOG10;
case "MOD":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to

case "MOD":
case "%":
....

@LantaoJin
Copy link
Member

@xinyual could you resolve the conflicts again?

xinyual and others added 6 commits March 17, 2025 14:23
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- remove comparision between string and integers
- correct thrown error types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- add alias % to MOD
- return negative when the dividend is negative

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Correct Math UDF Integration Tests
@xinyual
Copy link
Contributor Author

xinyual commented Mar 17, 2025

@xinyual could you resolve the conflicts again?

Already resolve them.

@LantaoJin
Copy link
Member

@yuancu spotlessJavaCheck FAILED

loadIndex(Index.STATE_COUNTRY_WITH_NULL);
}

private static boolean containsMessage(Throwable throwable, String message) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: use verifyErrorMessageContains

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem in this IT is all the test cases are applying on literals instead of fields.
Could you add some IT for fields with

    loadIndex(Index.DATA_TYPE_NUMERIC);
    loadIndex(Index.DATA_TYPE_NONNUMERIC);

Signed-off-by: xinyual <xinyual@amazon.com>
@xinyual
Copy link
Contributor Author

xinyual commented Mar 18, 2025

@yuancu spotlessJavaCheck FAILED

Fixed now.

yuancu and others added 3 commits March 18, 2025 19:18
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- additionally enrich math ITs with fields calculations

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Improve Math ITs & Correct MOD Function
@LantaoJin
Copy link
Member

Fix this please

CalcitePPLMathFunctionTest > testAbsWithOverriding FAILED
    java.lang.AssertionError: 
    Expected: is "LogicalProject(SAL0=[ABS(-30)])\n  LogicalTableScan(table=[[scott, EMP]])\n"
         but: was "LogicalProject(SAL=[ABS(-30)])\n  LogicalTableScan(table=[[scott, EMP]])\n"

Signed-off-by: xinyual <xinyual@amazon.com>
@LantaoJin LantaoJin merged commit 7f8cbfe into opensearch-project:feature/calcite-engine Mar 20, 2025
18 of 20 checks passed
@LantaoJin
Copy link
Member

LGTM, approved. Merging to dev branch.

penghuo pushed a commit that referenced this pull request Jun 16, 2025
* add math udfs

Signed-off-by: xinyual <xinyual@amazon.com>

* add log argument

Signed-off-by: xinyual <xinyual@amazon.com>

* Add math function unit tests
- Additionally implement user-defined ConvFunction

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add integration tests for Calcite math functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Rename CalcitePPLMathFunctionsIT to CalcitePPLBuiltinFunctionIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* add license

Signed-off-by: xinyual <xinyual@amazon.com>

* apply spot

Signed-off-by: xinyual <xinyual@amazon.com>

* Update the implementation of CONV function to align with v2's behavior
- Rename UserDefineFunctionUtils to UserDefinedFunctionUtils

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Improve code style:
- enforce uniform parameter number check
- comment on differences from calcite's implementation if necessary

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Simplify Calcite PPL math function unit tests

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Alter MOD and SQRT UDF to conform to documented behaviors
- return null with invalid (zero, negative) arguments
- return wider type for mod

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Complicate math integration tests
- edge cases for UDF
- combine operations or clauses

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Handle NULL return in ASIN, ACOS, SQRT and POW by convert returned Double.NaN and Float.NaN to null

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Apply spotless on math UDFs and their tests

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove unnecessary Double cast in SQRT UDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Convert returned Double.NaN and Float.NaN from math UDFs to LITERAL_NULL

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct math UDF integration tests
- remove comparision between string and integers
- correct thrown error types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update MOD UDF
- add alias % to MOD
- return negative when the dividend is negative

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Modify substring ITs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* apply spot

Signed-off-by: xinyual <xinyual@amazon.com>

* Replace containsMessage with verifyErrorMessageContains in math ITs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct MOD return types
- additionally enrich math ITs with fields calculations

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* fix UT

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants