Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented May 16, 2025

Description

Implement parameter validations. It supports validations for

  • Calcite built-in functions. E.g. registerOperator(ABS, SqlStdOperatorTable.ABS)
  • Self-implemented UDFs. E.g. registerOperator(SHA2, PPLBuiltinOperators.SHA2);
  • Directly registered implementations (with out registerOperator). E.g.
     register(
           TRIM,
           (FuncImpl1)
               (builder, arg) ->
                   builder.makeCall(
                       SqlStdOperatorTable.TRIM,
                       builder.makeFlag(Flag.BOTH),
                       builder.makeLiteral(" "),
                       arg));

Besides, it support functions with signature overriddings. E.g. timestamp has multiple signatures:

(STRING/DATE/TIME/TIMESTAMP) -> TIMESTAMP
(STRING/DATE/TIME/TIMESTAMP, STRING/DATE/TIME/TIMESTAMP) -> TIMESTAMP

This implementation is based on the PoC in #3562, where it explains why we can not directly adopt Calcite's built-in type checking mechanism for SQL and how our type checking works.

Issues Resolved

Resolves #3608

Implementation Notes

Classes related to type checking & their roles

  • UDFOperandMetadata: Created for compatibility with SqlUserDefinedFunction, which serves as the basis for implementing our UDFs. Since SqlUserDefinedFunction only accepts SqlOperandMetadata as its type checker, and SqlOperandMetadata is incompatible with Calcite's built-in operators, UDFOperandMetadata was introduced. It acts as a wrapper around Calcite's operand type checkers, allowing us to reuse them. This ensures that type checkers can be utilized in the same way as Calcite's built-in operators for type checking.
  • PPLTypeChecker: The type checker that does the real work. It retrieved acceptable types from Calcite's type checkers, and compare them with the actual ones.
  • SqlOperandTypeChecker, ImplicitCastOperandTypeChecker, FamilyOperandTypeChecker, ...: Calcite's builtin type checkers. Cannot be directly used since it requires a SqlCallBinding: boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure).
classDiagram
    class SqlOperandTypeChecker {
        <<interface>>
        +checkOperandTypes(SqlCallBinding, boolean)
    }

    class CompositeOperandTypeChecker {
    }

    class FamilyOperandTypeChecker {
    }

    class PPLTypeChecker {
		    <<interface>>
        +checkOperandTypes(actualTypes) : boolean
    }
    
    class PPLFamilyTypeChecker {
	    - families: List[SqlTypeFamily]
    }
    
    class PPLCompositeTypeChecker {
	    - rules: List[FamilyOperandTypeChecker]
    }

    class UDFOperandMetadata {
       <<interface>>
       <<extends>> SqlOperandMetadata
       <<extends>> ImplicitCastOperandTypeChecker
        +getInnerTypeChecker() : SqlOperandTypeChecker
    }
    
    class SqlOperator {
		    <<abstract>>
		    - operandTypeChecker: SqlOperandTypeChecker
		    + getOperandTypeChecker() : SqlOperandTypeChecker
    }
    
    class SqlUserDefinedFunction {
				+SqlUserDefinedFunction(SqlOperandMetadata, ...)
    }
    
    

    %% Relationships
    SqlOperandTypeChecker <|.. FamilyOperandTypeChecker
    SqlOperandTypeChecker <|.. CompositeOperandTypeChecker
    SqlOperator <|-- SqlUserDefinedFunction
    PPLTypeChecker  <|.. PPLFamilyTypeChecker
    PPLTypeChecker  <|.. PPLCompositeTypeChecker
    UDFOperandMetadata --> SqlUserDefinedFunction : feeds to constructor
    SqlOperator *-- SqlOperandTypeChecker: contains
    %% UDFOperandMetadata *-- SqlOperandTypeChecker
    PPLFamilyTypeChecker ..> FamilyOperandTypeChecker: wraps
    PPLCompositeTypeChecker ..> CompositeOperandTypeChecker: wraps
Loading

Type checking for Calcite's built-in operators & for self-implemented UDFs

  • For Calcite's built-in operators: the operator's type checker is first retrieved as a SqlOperandTypeChecker, then wrapped with PPLTypeChecker which does the actual type checking with the expected param types in the SqlOperandTypeChecker.
  • For UDFs: We first wrap Calcite's type checker with UDFOperandMetadata to create a SqlUderDefinedFunction, which implements SqlOperator. Afterwards, the actual Calcite type checker can be retrieved and used in the same way as Calcite's built-in operators.
  • For UDFs registered with register(funcName, FuncImp) interface, since there is no SqlOperator involved, we directly pass a PPLTypeChecker when creating a FuncImp instance.

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

qianheng-aws and others added 8 commits April 18, 2025 14:53
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…hecking for non-operator registrated udf

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
yuancu added 2 commits May 16, 2025 16:55
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
yuancu added 14 commits May 22, 2025 18:21
…ions

- additionally fixed span type checker

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… composition field of CompositeOperandTypeChecker via relection. May cause fatal errors if IllegalAccessException is thrown.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… checker)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
composition.name()));
}
} catch (IllegalAccessException e) {
LogManager.getLogger(PPLTypeChecker.class).error(e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @log4j2?

Copy link
Collaborator Author

@yuancu yuancu Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it was in an interface, I was hesitated in adding a static member logger to it. Therefore, I used LogManageer.getLogger().error(). It's also part of log4j.

Now this line is removed.

@Test
public void testLowerWithIntegerType() {
String ppl = "source=EMP | eval lower_name = lower(EMPNO) | fields lower_name";
Assert.assertThrows(Exception.class, () -> getRelNode(ppl));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see VARCHAR in other test cases. for instance, testTimestampWithWrongArg, does this issue all solved?

yuancu added 3 commits May 28, 2025 11:08
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
LantaoJin
LantaoJin previously approved these changes Jun 4, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could investigate how to leverage SqlNode validation later 3.1.0. Approved.

yuancu added 2 commits June 5, 2025 19:33
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@LantaoJin LantaoJin merged commit 270aa0d into opensearch-project:main Jun 6, 2025
16 of 22 checks passed
penghuo pushed a commit that referenced this pull request Jun 16, 2025
* Support type checker For PPL functions

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Implement type checkers for UDF

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Implement composite type checker

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Define type checkers for UDFs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Display expected signatures when type checking fails & support type checking for non-operator registrated udf

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add ITs for UDF type checking

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix: correct CONV's operand types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix test parse IT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove unused code in UserDefinedFunctionUtils

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Refactor: improve ITs, javadocs, type checkers

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Improve javadoc for PPLTypeChecker

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Return false at validation for mismatched operand count with expectations
- additionally fixed span type checker

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add type checker for cidrmatch

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct cidrmatch type checker

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix compile error in CalcitePPLAbstractTest.java

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Experiment: support type checking for built-in operators by accessing composition field of CompositeOperandTypeChecker via relection. May cause fatal errors if IllegalAccessException is thrown.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Check composition type for only calcite's built-in operators

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support parameter validation for comparabale operators

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support parameter validation for coalesce (by reusing comparable type checker)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support parameter validation for substring

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support parameter validation for if

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support parameter validation for nullif, isempty, isblank

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Align types in type check error information to actual types in PPL

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Display PPL types when failing to resolve a function

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Create type checker for geoip function

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Define type checker for grok and item operators

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Co-authored-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] PPL-Calcite Function Parameter Validation

5 participants