-
Notifications
You must be signed in to change notification settings - Fork 181
Migrate existing UDFs to PPLFuncImpTable #3576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- Problem: FunctionName.of(UPPER_CASE_I) could not be properly resolved since I is converted to dotless ı Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- date part: hour, hour_of_day, month_of_year, month, dayofweek, day_of_week, day, dayofyear, day_of_year, dayofmonth, day_of_month, minute_of_day, minute_of_hour, minute, quarter, second, second_of_minute, microsecond - date/time manipulation: addtime, adddate, date_sub, date_add, subtime, subdate, extract Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
by creating an adapt function to convert an expr function to a UDF Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…imediff UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
use timestamp to replace post process
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…ts logical plan Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… another PR Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…primitive types Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…ementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…tion Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| public static SqlReturnTypeInference TIME_APPLY_RETURN_TYPE = | ||
| opBinding -> { | ||
| RelDataType temporalType = opBinding.getOperandType(0); | ||
| if (OpenSearchTypeFactory.convertRelDataTypeToSqlTypeName(temporalType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing to use SqlTypeName.TIME here, temporalType here is actually UDT whose real SqlTypeName is VARCHAR actually. As discussed, we can determine the return type based on whether temporalType is AbstractExprRelDataType with getUDT equal to EXPR_TIME
Could we remove this API convertRelDataTypeToSqlTypeName? Its logic doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the convertRelDataTypeToSqlTypeName API and adopted convertRelDataTypeToExprType to compare types, so that I can use a more unified interface to compare not only UDT but also builtin types like String. The converted ExprType is also widely used in the conversion to ExprValue (to reuse v2's implementation). Is the new implementation more logically acceptable?
| public final class DateTimeApplyUtils { | ||
| private DateTimeApplyUtils() {} | ||
|
|
||
| public static Instant applyInterval(Instant base, Duration interval, boolean isAdd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. It was only used in previous implementations.
…Type comparision Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…java object and expr type separately Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…return types (instead of Object) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| import org.opensearch.sql.expression.function.ImplementorUDF; | ||
|
|
||
| /** <code>SQRT(value)</code> returns the square root of a number. */ | ||
| public class SqrtFunction extends ImplementorUDF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calcite has sqrt function?
0: jdbc:calcite:model=src/test/resources/mode> select sqrt(9);
+--------+
| EXPR$0 |
+--------+
| 3.0 |
+--------+
1 row selected (0.588 seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQRT is only declared in SqlStdOperatorTable, but not implemented in RexImpTable. I.e. there isn't a statement like defineMethod(SQRT, BuiltInMethod.SQRT.method, NullPolicy.STRICT) in RexImpTable.
If I register sqrt to the calcite SQRT operator, it will throw an error at runtime:
...
java.lang.RuntimeException: java.sql.SQLException: Error while preparing plan [LogicalProject(name=[$0], age=[$5], month=[$3])
LogicalFilter(condition=[AND(=(SQRT(POWER($5, 2)), 30.0E0), =(CBRT(POWER($3, 3)), 4))])
...
java.lang.IllegalStateException: Unable to implement EnumerableCalc(expr#0..11=[{inputs}], expr#12=[2], expr#13=[POWER($t5, $t12)], expr#14=[SQRT($t13)], expr#15=[30.0E0:DOUBLE], expr#16=[=($t14, $t15)], expr#17=[3], expr#18=[POWER($t3, $t17)], expr#19=[CBRT($t18)], expr#20=[4], expr#21=[=($t19, $t20)], expr#22=[AND($t16, $t21)], name=[$t0], age=[$t5], month=[$t3], $condition=[$t22]): rowcount = 2.25, cumulative cost = {102.25 rows, 2801.0 cpu, 0.0 io}, id = 1388
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_state_country]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianheng-aws pointed out that SQRT is converted to POWER(*, 0.5). I've removed the sqrt udf and register the operator to power with 0.5 as the second parameter instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianheng-aws pointed out that
SQRTis converted toPOWER(*, 0.5). I've removed the sqrt udf and register the operator topowerwith 0.5 as the second parameter instead.
Since Calcite already has the SQRT function definition in SqlStdOperatorTable and internal implementation (converting to POWER(*, 0.5) is also a kind of implementation), why do we still need to explicitly add this conversion in our side? @yuancu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our implementation, when using operators from Calcite, their implementations have to exist in RexImpTable. Otherwise, it will throw an error like cannot translate call SQRT. See RexToLixTranslater.java#L1428.
In Calcite SQL, the conversion happens in SqlToRelConverter. See SqlNodeToRexConverterImpl.java#L59. However, we do not have access to SqlCall in our implementation. Therefore, we cannot directly reuse Calcite's conversion of SQRT function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuancu We can similar convert in PPLFuncImplTable.
register(SQRT,
((FunctionImp1)
(builder, arg1) -> builder.makeCall(SqlStdOperatorTable.POWER, arg1,
builder.makeApproxLiteral(BigDecimal.valueOf(0.5)))));
@qianheng-aws IMO, PPLFuncImplTable support 2 ways to define function. 1) Use existing SqlFunctions. 2) UDF. Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I have registered SQRT in this fashion.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
* refactor using timestmap Signed-off-by: xinyual <xinyual@amazon.com> * refactor for timestmap Signed-off-by: xinyual <xinyual@amazon.com> * transfer basic functions Signed-off-by: xinyual <xinyual@amazon.com> * add functions Signed-off-by: xinyual <xinyual@amazon.com> * Refactor math UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor text UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix function name resolving when system using Turkic languages - Problem: FunctionName.of(UPPER_CASE_I) could not be properly resolved since I is converted to dotless ı Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor date part, date/time manipulation UDFs - date part: hour, hour_of_day, month_of_year, month, dayofweek, day_of_week, day, dayofyear, day_of_year, dayofmonth, day_of_month, minute_of_day, minute_of_hour, minute, quarter, second, second_of_minute, microsecond - date/time manipulation: addtime, adddate, date_sub, date_add, subtime, subdate, extract Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor current, name, format UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor convert tz, date/timestamp diff UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor date / time functions with non-date/time arguments by creating an adapt function to convert an expr function to a UDF Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor datetime, sec_to_time, time_to_sec, sysdate, timestampadd, timediff UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor week, nullif, ifnull, if UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove BuiltinFunctionUtils and unnecessary UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Patch: throw proper semantic errors for date_part-related udfs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix bugs in datetime UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: move all udf to org.opensearch.sql.expression.function.udf Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * use timestamp to replace post process Signed-off-by: xinyual <xinyual@amazon.com> * Add more ITs for last day & restore Prometheus settings Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix euler with UDF instead of directly returning a literal to match its logical plan Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Reimplment last_day UDF with expr methods Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Ignore tests related to parameter validaton temporarily since it's in another PR Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Deprecate timestamp string as data context for function properties Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: unify datetime udf structures Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Apply spotless & fix zone id acquirement Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix unixtimestamp from double when locale uses non-arabic numbers Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Box operands so that relection calls match signatures containing non-primitive types Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Use system default time zone when restoring function properties Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor return types Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Use builtin case and coalesce to implement if, ifnull, nullif UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Replace replace UDF with builtin implementations & add docs for replace, locate Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct locate function documents Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Separate adddate and date_add implementations for readability Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add class documentations for UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Create PPLReturnTypes class to store common return types that it not covered in calcite ReturnTypes class Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Remove udf operators with the same implementations but different names Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in Add/Sub Date function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: update add/sub time's implementation with adaptExprMethodWithPropertiesToUDF Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in last day function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: implement date function with adaptExprMethodToUDF Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in extract function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in format function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in timestampdiff function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: Move type comparison earlier in weekday function implementation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: split add and sub date to 2 static functions Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Eliminate convertRelDataTypeToSqlTypeName API, replacing it with ExprType comparision Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: pass expr values to udf implementations instead of passing java object and expr type separately Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: unify types in udf to use expr types (deprecating SqlTypeName) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Refactor: change return types of UDF implementations to their actual return types (instead of Object) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add a test to unix_timestamp with microseconds Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Minor: fix calcite datetime tests Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Replace sqrt udf with a call to calcite's power(x, 0.5) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Replace locate udf with SqlStdOperatorTable.POSITION Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix sqrt logical plan test Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Fix expectedSparkSql in sqrt test Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> Co-authored-by: xinyual <xinyual@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
Refactor existing math, datetime, text, condition UDFs to PPLFuncImpTable.
Why self-implementing instead of leveraging Calcite builtin functions
Not implemented by calcite
Different definition
Datetime related: we use UDT and accepts STRING most of the time.
Note:
TODOs:
Related Issues
Resolves #3524
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.