Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Apr 23, 2025

Description

Refactor existing math, datetime, text, condition UDFs to PPLFuncImpTable.

Why self-implementing instead of leveraging Calcite builtin functions

  • Not implemented by calcite

    • CONV
    • CRC32 (implemented in a newer version of calcite)
    • E
    • LOCATE
  • Different definition

    • MOD & DIVIDE: return types (ours return wider type; calcite implementation return divisor type); handling of divisor 0 (ours return null)
  • Datetime related: we use UDT and accepts STRING most of the time.

Note:

  • UDAFs are kept untouched.

TODOs:

  • Fix filled date problems related to function properties.

Related Issues

Resolves #3524

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

xinyual and others added 30 commits April 14, 2025 14:04
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- Problem: FunctionName.of(UPPER_CASE_I) could not be properly resolved since I is converted to dotless ı

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- date part: hour, hour_of_day, month_of_year, month, dayofweek, day_of_week, day, dayofyear, day_of_year, dayofmonth, day_of_month, minute_of_day, minute_of_hour, minute, quarter, second, second_of_minute, microsecond
- date/time manipulation: addtime, adddate, date_sub, date_add, subtime, subdate, extract

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
by creating an adapt function to convert an expr function to a UDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…imediff UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
use timestamp to replace post process
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…ts logical plan

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… another PR

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…primitive types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
yuancu added 3 commits May 7, 2025 15:51
…ementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…tion

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
public static SqlReturnTypeInference TIME_APPLY_RETURN_TYPE =
opBinding -> {
RelDataType temporalType = opBinding.getOperandType(0);
if (OpenSearchTypeFactory.convertRelDataTypeToSqlTypeName(temporalType)
Copy link
Collaborator

@qianheng-aws qianheng-aws May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to use SqlTypeName.TIME here, temporalType here is actually UDT whose real SqlTypeName is VARCHAR actually. As discussed, we can determine the return type based on whether temporalType is AbstractExprRelDataType with getUDT equal to EXPR_TIME

Could we remove this API convertRelDataTypeToSqlTypeName? Its logic doesn't make sense.

Copy link
Collaborator Author

@yuancu yuancu May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the convertRelDataTypeToSqlTypeName API and adopted convertRelDataTypeToExprType to compare types, so that I can use a more unified interface to compare not only UDT but also builtin types like String. The converted ExprType is also widely used in the conversion to ExprValue (to reuse v2's implementation). Is the new implementation more logically acceptable?

public final class DateTimeApplyUtils {
private DateTimeApplyUtils() {}

public static Instant applyInterval(Instant base, Duration interval, boolean isAdd) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems never used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. It was only used in previous implementations.

yuancu added 4 commits May 8, 2025 13:59
…Type comparision

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…java object and expr type separately

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…return types (instead of Object)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
import org.opensearch.sql.expression.function.ImplementorUDF;

/** <code>SQRT(value)</code> returns the square root of a number. */
public class SqrtFunction extends ImplementorUDF {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calcite has sqrt function?

0: jdbc:calcite:model=src/test/resources/mode> select sqrt(9);
+--------+
| EXPR$0 |
+--------+
| 3.0    |
+--------+
1 row selected (0.588 seconds)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQRT is only declared in SqlStdOperatorTable, but not implemented in RexImpTable. I.e. there isn't a statement like defineMethod(SQRT, BuiltInMethod.SQRT.method, NullPolicy.STRICT) in RexImpTable.

If I register sqrt to the calcite SQRT operator, it will throw an error at runtime:

...
java.lang.RuntimeException: java.sql.SQLException: Error while preparing plan [LogicalProject(name=[$0], age=[$5], month=[$3])
              LogicalFilter(condition=[AND(=(SQRT(POWER($5, 2)), 30.0E0), =(CBRT(POWER($3, 3)), 4))])
    ...
    java.lang.IllegalStateException: Unable to implement EnumerableCalc(expr#0..11=[{inputs}], expr#12=[2], expr#13=[POWER($t5, $t12)], expr#14=[SQRT($t13)], expr#15=[30.0E0:DOUBLE], expr#16=[=($t14, $t15)], expr#17=[3], expr#18=[POWER($t3, $t17)], expr#19=[CBRT($t18)], expr#20=[4], expr#21=[=($t19, $t20)], expr#22=[AND($t16, $t21)], name=[$t0], age=[$t5], month=[$t3], $condition=[$t22]): rowcount = 2.25, cumulative cost = {102.25 rows, 2801.0 cpu, 0.0 io}, id = 1388
     CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_state_country]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1353

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qianheng-aws pointed out that SQRT is converted to POWER(*, 0.5). I've removed the sqrt udf and register the operator to power with 0.5 as the second parameter instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qianheng-aws pointed out that SQRT is converted to POWER(*, 0.5). I've removed the sqrt udf and register the operator to power with 0.5 as the second parameter instead.

Since Calcite already has the SQRT function definition in SqlStdOperatorTable and internal implementation (converting to POWER(*, 0.5) is also a kind of implementation), why do we still need to explicitly add this conversion in our side? @yuancu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our implementation, when using operators from Calcite, their implementations have to exist in RexImpTable. Otherwise, it will throw an error like cannot translate call SQRT. See RexToLixTranslater.java#L1428.

In Calcite SQL, the conversion happens in SqlToRelConverter. See SqlNodeToRexConverterImpl.java#L59. However, we do not have access to SqlCall in our implementation. Therefore, we cannot directly reuse Calcite's conversion of SQRT function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuancu We can similar convert in PPLFuncImplTable.

      register(SQRT,
          ((FunctionImp1)
          (builder, arg1) -> builder.makeCall(SqlStdOperatorTable.POWER, arg1,
              builder.makeApproxLiteral(BigDecimal.valueOf(0.5)))));

@qianheng-aws IMO, PPLFuncImplTable support 2 ways to define function. 1) Use existing SqlFunctions. 2) UDF. Is it correct?

Copy link
Collaborator Author

@yuancu yuancu May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have registered SQRT in this fashion.

yuancu added 6 commits May 9, 2025 11:03
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
qianheng-aws
qianheng-aws previously approved these changes May 12, 2025
@yuancu yuancu requested a review from LantaoJin May 12, 2025 05:33
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@penghuo penghuo merged commit eadeca2 into opensearch-project:main May 14, 2025
22 checks passed
penghuo pushed a commit that referenced this pull request Jun 16, 2025
* refactor using timestmap

Signed-off-by: xinyual <xinyual@amazon.com>

* refactor for timestmap

Signed-off-by: xinyual <xinyual@amazon.com>

* transfer basic functions

Signed-off-by: xinyual <xinyual@amazon.com>

* add functions

Signed-off-by: xinyual <xinyual@amazon.com>

* Refactor math UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor text UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix function name resolving when system using Turkic languages
- Problem: FunctionName.of(UPPER_CASE_I) could not be properly resolved since I is converted to dotless ı

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor date part, date/time manipulation UDFs
- date part: hour, hour_of_day, month_of_year, month, dayofweek, day_of_week, day, dayofyear, day_of_year, dayofmonth, day_of_month, minute_of_day, minute_of_hour, minute, quarter, second, second_of_minute, microsecond
- date/time manipulation: addtime, adddate, date_sub, date_add, subtime, subdate, extract

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor current, name, format UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor convert tz, date/timestamp diff UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor date / time functions with non-date/time arguments
by creating an adapt function to convert an expr function to a UDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor datetime, sec_to_time, time_to_sec, sysdate, timestampadd, timediff UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor week, nullif, ifnull, if UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove BuiltinFunctionUtils and unnecessary UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Patch: throw proper semantic errors for date_part-related udfs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix bugs in datetime UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: move all udf to org.opensearch.sql.expression.function.udf

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* use timestamp to replace post process

Signed-off-by: xinyual <xinyual@amazon.com>

* Add more ITs for last day & restore Prometheus settings

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix euler with UDF instead of directly returning a literal to match its logical plan

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Reimplment last_day UDF with expr methods

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Ignore tests related to parameter validaton temporarily since it's in another PR

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Deprecate timestamp string as data context for function properties

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: unify datetime udf structures

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Apply spotless & fix zone id acquirement

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix unixtimestamp from double when locale uses non-arabic numbers

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Box operands so that relection calls match signatures containing non-primitive types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Use system default time zone when restoring function properties

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor return types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Use builtin case and coalesce to implement if, ifnull, nullif UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Replace replace UDF with builtin implementations & add docs for replace, locate

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct locate function documents

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Separate adddate and date_add implementations for readability

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add class documentations for UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Create PPLReturnTypes class to store common return types that it not covered in calcite ReturnTypes class

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove udf operators with the same implementations but different names

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in Add/Sub Date function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: update add/sub time's implementation with adaptExprMethodWithPropertiesToUDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in last day function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: implement date function with adaptExprMethodToUDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in extract function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in format function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in timestampdiff function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: Move type comparison earlier in weekday function implementation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: split add and sub date to 2 static functions

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Eliminate convertRelDataTypeToSqlTypeName API, replacing it with ExprType comparision

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: pass expr values to udf implementations instead of passing java object and expr type separately

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: unify types in udf to use expr types (deprecating SqlTypeName)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: change return types of UDF implementations to their actual return types (instead of Object)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a test to unix_timestamp with microseconds

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Minor: fix calcite datetime tests

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Replace sqrt udf with a call to calcite's power(x, 0.5)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Replace locate udf with SqlStdOperatorTable.POSITION

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix sqrt logical plan test

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix expectedSparkSql in sqrt test

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
@yuancu yuancu deleted the refactor-udf branch August 25, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated maintenance Improves code quality, but not the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Umbrella] Migrate existing functions to PPLFuncImpTable

6 participants