-
Notifications
You must be signed in to change notification settings - Fork 181
Use Calcite's validation system for type checking & coercion #4892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughIntroduces extensive Calcite-based PPL validation and type-coercion plumbing, new PPL-specific validator/convertlets/coercion rules, UDT/type utilities, many UDF operand-metadata updates, Rex/Rel shuttles and Rel↔Sql converters, removal of legacy coercion/type-checker code, and large test/expected-output updates across integ and unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Rel as RelNode (PPL plan)
participant Shuttle as PplRelToSqlRelShuttle
participant RelToSql as OpenSearchRelToSqlConverter
participant Validator as SqlValidator / PplValidator
participant SqlToRel as OpenSearchSqlToRelConverter
participant Planner as QueryService (convertToCalcitePlan)
Rel->>Shuttle: traverse & normalize RexLiterals
Shuttle->>RelToSql: convert RelNode -> SqlNode
RelToSql->>Validator: validate SqlNode (type coercion, implicit casts)
Validator-->>SqlToRel: produce validated SqlNode
SqlToRel->>Planner: convert validated SqlNode -> RelNode (validated RelNode)
Planner->>Planner: continue planning / optimize using validated RelNode
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
e085f81 to
fc6dd27
Compare
…checking Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type - todo: update the toRex in ExtendedRelJson to the align with the latest version Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast - rewrite call to sql compare to custom ip comapre Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1818-1825: Track the TODO for push-down regressionThe new TODO suggests a known push-down regression. Please link it to a tracking issue (or add one) so it doesn’t get lost.
If you want, I can draft the issue text or propose a targeted test to lock in the desired push-down behavior.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1736-1835: Preserve explicit window ORDER BY when embedding collations.
embedExistingCollationsIntoOverunconditionally replaces the window's order keys with the plan collation, overwriting any explicit ORDER BY specified in the window. When a window already has orderKeys, they should be preserved. Consider only injecting plan collations when the window has no existing order keys.🛠️ Proposed fix
public RexNode visitOver(RexOver over) { RexWindow window = over.getWindow(); + ImmutableList<RexFieldCollation> orderKeys = + window.orderKeys.isEmpty() ? rexCollations : window.orderKeys; + if (orderKeys.isEmpty()) { + return over; + } return context.rexBuilder.makeOver( over.getType(), over.getAggOperator(), over.getOperands(), window.partitionKeys, - rexCollations, + orderKeys, window.getLowerBound(), window.getUpperBound(), window.isRows(), true, false, over.isDistinct(), over.ignoreNulls()); }core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
398-425: Fail fast on null external registrations and make duplicate detection atomic.Right now a null operator/aggFunction would surface as an NPE later, and
containsKey+putcan misreport duplicates under concurrency.🐛 Proposed fix (null guard + atomic warn)
public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) { - if (externalFunctionRegistry.containsKey(functionName)) { - logger.warn( - String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); - } - externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args)); + if (operator == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Operator cannot be null for function %s", functionName)); + } + FunctionImp impl = (builder, args) -> builder.makeCall(operator, args); + FunctionImp previous = externalFunctionRegistry.put(functionName, impl); + if (previous != null) { + logger.warn( + String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); + } } public void registerExternalAggOperator( BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { - if (aggExternalFunctionRegistry.containsKey(functionName)) { - logger.warn( - String.format( - Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); - } + if (aggFunction == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Aggregate function cannot be null for %s", functionName)); + } AggHandler handler = (distinct, field, argList, ctx) -> UserDefinedFunctionUtils.makeAggregateCall( aggFunction, List.of(field), argList, ctx.relBuilder); - aggExternalFunctionRegistry.put(functionName, handler); + AggHandler previous = aggExternalFunctionRegistry.put(functionName, handler); + if (previous != null) { + logger.warn( + String.format( + Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); + } }
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendFunctionImpl.java`:
- Around line 49-50: Update the operand type checker in AppendFunctionImpl by
changing getOperandMetadata() to use OperandTypes.VARIADIC instead of
OperandTypes.SAME_VARIADIC so mixed-type variadic arguments are allowed
(matching AppendCore.collectElements() behavior); mirror the approach used in
MVAppendFunctionImpl and keep the existing return-type inference logic intact to
handle zero-arg cases.
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`:
- Around line 494-510: The rewriteCall in PPLBuiltinOperators.ATAN unsafely
casts SqlCall to SqlBasicCall causing ClassCastException for other SqlCall
subclasses (e.g., SqlCase); fix by first checking if call instanceof
SqlBasicCall and if so setOperator as before, otherwise rebuild and return a new
SqlBasicCall using the chosen SqlOperator (SqlStdOperatorTable.ATAN2 or
SqlStdOperatorTable.ATAN), passing call.getOperandList() and
call.getParserPosition() so operands and position are preserved; ensure you
reference the SqlOperator selection logic (call.getOperandList().size() == 2)
and preserve semantics when constructing the new SqlBasicCall.
In `@integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml`:
- Line 5: Add a test record with a null or missing author.name to the dataset
that generates this expected output so the CASE expression in LogicalProject
($f0=[CASE(<($7, 'K'), 1, null:INTEGER)]) is exercised for the null branch;
update the input test data used by the calcite/agg_filter_nested test (the
sample documents) to include one document where author is absent or author.name
is explicitly null, then rerun the test to regenerate/verify the expected output
includes the null branch behavior.
♻️ Duplicate comments (5)
docs/user/ppl/interfaces/endpoint.md (1)
123-125: Add a brief note explaining logical vs physical cast representation.The extended example now shows
30:BIGINTin the physical plan while the logical plan keeps>($5, 30). A short note would prevent confusion for users comparing plans or parsing explain output.core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
310-317: Add logging when validation is skipped for tolerant exceptions.When
ValidationUtils.tolerantValidationException(e)returns true and the originalrelNodeis returned, there's no logging to indicate that validation was bypassed. This makes it harder to troubleshoot queries that silently skip validation due to known Calcite limitations.🔍 Suggested logging for observability
try { validator.validate(Objects.requireNonNull(rewritten)); } catch (CalciteContextException e) { if (ValidationUtils.tolerantValidationException(e)) { + log.debug("Skipping validation due to known Calcite limitation: {}", e.getMessage()); return relNode; } throw new ExpressionEvaluationException(e.getMessage(), e); }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
140-145: Close the JDBC Statement created for validator initialization.Line 142 creates a
Statementthat is never closed; this can leak resources under load. Please use try‑with‑resources or retain and close it when the context is disposed.As per coding guidelines, use try-with-resources for resource cleanup.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
500-515: Null guard forregisterOperator(same as earlier note).Still missing a null check on
operatorbefore building the lambda.
844-855: Guard empty-arg ADD dispatch (same as earlier note).
Stream.of(args).allMatch(...)returnstruefor empty args, selecting CONCAT unexpectedly; add anargs.length > 0guard and document the dispatch pattern.
🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (1)
283-292: Add@throwsdocumentation for the exception case.Per coding guidelines, public/private methods with JavaDoc should document exceptions. The method throws
ExpressionEvaluationExceptionon validation failure.📝 Suggested JavaDoc update
/** * Validates a RelNode by converting it to SqlNode, performing validation, and converting back. * * <p>This process enables Calcite's type validation and implicit casting mechanisms to work on * PPL queries. * * `@param` relNode the relation node to validate * `@param` context the Calcite plan context containing the validator * `@return` the validated (and potentially modified) relation node + * `@throws` ExpressionEvaluationException if validation fails with a non-tolerant error */core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java (1)
49-50: Align MAP operand validation with String‑key expectations.Line 50 now uses
OperandTypes.family(SqlTypeFamily.MAP, SqlTypeFamily.MAP), which only checks the MAP family and doesn’t constrain key types. The implementation casts keys toString, so non‑string keys could pass validation but fail at runtime. Please confirm map keys are alwaysVARCHARin this engine; otherwise consider tightening the checker (customSqlOperandTypeChecker) or adding a key‑type guard inverifyMap.Calcite OperandTypes.family(SqlTypeFamily.MAP) does it validate MAP key types? How can a SqlOperandTypeChecker enforce Map<VARCHAR, ?> operands?integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
491-504: Avoid re-cleaning the actual YAML inside the loopYou can precompute the cleaned actual YAML once and add a null/empty guard for clearer failures. This keeps comparisons consistent and slightly cheaper.
♻️ Proposed refactor
- public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) { - if (expectedYamls.isEmpty()) { + public static void assertYamlEqualsIgnoreId(List<String> expectedYamls, String actualYaml) { + if (expectedYamls == null || expectedYamls.isEmpty()) { throw new IllegalArgumentException("expectedYamls list must not be empty"); } + String cleanedActual = cleanUpYaml(actualYaml); AssertionError lastError = null; for (String expectedYaml : expectedYamls) { try { - assertYamlEquals(cleanUpYaml(expectedYaml), cleanUpYaml(actualYaml)); + assertYamlEquals(cleanUpYaml(expectedYaml), cleanedActual); return; } catch (AssertionError e) { lastError = e; } } throw lastError; }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
90-96: Add an explicit setter with JavaDoc for the operator table provider.Lombok’s public static setter won’t carry method‑level JavaDoc, but the guidelines require public methods to be documented. Consider replacing the Lombok setter with an explicit method (or reduce its visibility if it’s not part of the public API).
♻️ Suggested refactor
- `@Setter` private static SqlOperatorTableProvider operatorTableProvider; + private static SqlOperatorTableProvider operatorTableProvider; + + /** + * Sets the SQL operator table provider. This must be called during initialization by the + * opensearch module. + * + * `@param` provider the provider to use for obtaining operator tables + */ + public static void setOperatorTableProvider(SqlOperatorTableProvider provider) { + operatorTableProvider = java.util.Objects.requireNonNull(provider, "provider"); + }As per coding guidelines, public methods should have JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
986-1003: Consider guarding duplicate agg registrations.
AggBuilder.registercurrently allows silent overrides; mirroring theBuilderguard would make mistakes louder.♻️ Suggested change
void register(BuiltinFunctionName functionName, AggHandler aggHandler) { - map.put(functionName, aggHandler); + if (map.containsKey(functionName)) { + throw new IllegalStateException( + String.format( + Locale.ROOT, + "Each aggregate function can only be registered once: %s", + functionName)); + } + map.put(functionName, aggHandler); }
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendFunctionImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Show resolved
Hide resolved
integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml
Show resolved
Hide resolved
- use OperandTypes.CHARACTER consistenly - use OperandTypes.VARIADIC for append function - add safeguard for atan rewrite Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| import org.apache.calcite.sql.type.OperandTypes; | ||
| import org.apache.calcite.sql.type.SqlReturnTypeInference; | ||
| import org.apache.calcite.sql.validate.SqlUserDefinedFunction; | ||
| import org.jspecify.annotations.NonNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lombok.NonNull is not for annotating return value; Calcite adopts org.checkerframework.checker.nullness.qual.NonNull, which is heavier than org.jspecify.annotations.NonNull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java (1)
33-40: Add JavaDoc for public getOperandMetadata().This public method is in
core/src/main/javaand now changed; guidelines require JavaDoc with at least@return. As per coding guidelines, please add JavaDoc.✍️ Suggested update
`@Override` + /** + * {`@inheritDoc`} + * + * `@return` non-null operand metadata for numeric operands + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.NUMERIC; }core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)
65-74: Add JavaDoc for the public override.
getOperandMetadata()is public and changed; it still lacks JavaDoc with@return, which is required forcore/src/main/java/**/*.java.💡 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for mvzip. + * + * `@return` operand metadata for 2- or 3-argument variants + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() {As per coding guidelines, document public methods with JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVFindFunctionImpl.java (1)
48-53: Add JavaDoc for this public override.
The method is public but has no JavaDoc with@return. Please add a brief block to satisfy the project rules.📝 Proposed JavaDoc
`@Override` - public `@NonNull` UDFOperandMetadata getOperandMetadata() { + /** + * Operand metadata for mvfind. + * + * `@return` operand metadata describing accepted argument families + */ + public `@NonNull` UDFOperandMetadata getOperandMetadata() {As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)
33-45: Add JavaDoc with@returnfor the public method.Public methods must have proper JavaDoc; the current block comment doesn’t include
@return. As per coding guidelines, convert this to JavaDoc and include the return tag.🔧 Proposed fix
- /* - * The first parameter is always required (either fields or query). - * The second parameter is query when fields are present, otherwise it's the first parameter. - * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries. - * Different query has different parameter set, which will be validated in dedicated query builder. - * Query parameter is always required and cannot be null. - */ + /** + * The first parameter is always required (either fields or query). + * The second parameter is query when fields are present, otherwise it's the first parameter. + * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries. + * Different query has different parameter set, which will be validated in dedicated query builder. + * Query parameter is always required and cannot be null. + * + * `@return` operand metadata for relevance query functions + */core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)
88-96: Remove unsupported CHARACTER operand from metadata or implement handling.The operand metadata declares
(CHARACTER, NUMERIC, CHARACTER)as valid (line 63), but the implementor does not support it. When a CHARACTER field is passed with a CHARACTER unit string:
- Line 88 condition is false (unitType is VARCHAR, not null/ANY)
- Line 97 condition is false (plain VARCHAR is not instanceof ExprSqlType)
- Execution falls through to the exception at line 115
All current tests use either numeric fields or fields wrapped in ExprSqlType (EXPR_DATE/EXPR_TIME/EXPR_TIMESTAMP), never plain STRING/CHARACTER fields. Either remove the CHARACTER operand from metadata or implement the missing path to handle string-based spans.
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java`:
- Around line 232-235: getOperandMetadata() is annotated `@NonNull` but returns
the nullable field operandMetadata (set in adaptExprMethodToUDF), violating the
contract; either validate and throw early when operandMetadata is null or return
a non-null default instance. Fix by updating adaptExprMethodToUDF to ensure it
never leaves operandMetadata null (validate inputs and throw
IllegalArgumentException) or change getOperandMetadata() to return a safe
default UDFOperandMetadata (or throw a clear IllegalStateException) when
operandMetadata == null, referencing getOperandMetadata(), operandMetadata, and
adaptExprMethodToUDF to locate the change.
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 55-58: TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.
In
`@core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java`:
- Around line 71-74: getOperandCountRange currently only enforces a minimum of 3
arguments which allows even argument counts and causes IndexOutOfBounds in
JsonSetFunctionImpl.eval when it assumes path/value pairs; update
getOperandCountRange in class JsonSetFunctionImpl to validate that the total
operand count is odd (>=3) — either by returning a custom SqlOperandCountRange
that enforces odd counts or by adding explicit validation at the start of eval
(method eval) to throw a clear validation exception if keys.size() is even,
preventing the keys.get(i+1) access from ever indexing out of bounds.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/binning/MinspanBucketFunction.java`:
- Around line 50-52: Add a JavaDoc block to the public method
getOperandMetadata() in the MinspanBucketFunction class that documents the
method purpose and includes an `@return` tag describing the UDFOperandMetadata
value being returned (specifically that it supplies the operand types
PPLOperandTypes.NUMERIC_NUMERIC_NUMERIC_NUMERIC for this UDF). Keep the comment
concise and placed immediately above the getOperandMetadata() method
declaration.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java`:
- Around line 53-56: Add JavaDoc to the public override method
getOperandMetadata in ToSecondsFunction: describe what the method returns and
include an `@return` tag that specifies it returns the operand metadata
(PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER) used by the UDF; keep the
comment concise and place it immediately above the getOperandMetadata()
override.
- Line 23: Add JavaDoc to the public override getOperandMetadata() in
ToSecondsFunction explaining it "Returns the operand metadata for the function
signature" and annotate the method return with `@NonNull` as suggested; update the
method in class ToSecondsFunction to return
PPLOperandTypes.DATETIME_OR_STRING_OR_INTEGER and include the `@Override` and
JavaDoc. Also verify that the org.jspecify.annotations.NonNull import is
available to the build: if not provided by the runtime/plugin, add the jspecify
dependency to core/build.gradle so the import resolves.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/WeekdayFunction.java`:
- Around line 52-55: Add a JavaDoc block to the public override of
getOperandMetadata in WeekdayFunction describing the method and include an
`@return` tag explaining it returns the operand metadata for this UDF; reference
the getOperandMetadata method and the PPLOperandTypes.DATETIME_OPTIONAL_INTEGER
return value so it's clear this documents that the function accepts a DATETIME
and an optional INTEGER operand type.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java`:
- Around line 42-45: The public override EulerFunction.getOperandMetadata lacks
JavaDoc; add a short Javadoc comment (or use {`@inheritDoc`}) above the method and
include an `@return` describing that it returns the UDFOperandMetadata for this
UDF (specifically PPLOperandTypes.NONE), referencing the return type
UDFOperandMetadata and the constant PPLOperandTypes.NONE so readers know this
UDF takes no operands.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 42-44: Add a JavaDoc comment to the public method getKind() in
class ModFunction: document that this method returns the SQL operator kind for
the modulo operation and include an `@return` tag stating it returns SqlKind.MOD;
ensure the JavaDoc also notes it overrides the superclass method (since
getKind() is an override) and follows project formatting conventions.
In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 36-46: Update the Javadoc for getOperandMetadata() to match the
implementation: remove the phrase "default null without checking" and document
that the default returns a non-null UDFOperandMetadata that wraps
OperandTypes.VARIADIC (i.e., uses a permissive type checker and effectively
skips strict type checking). Mention the `@NonNull` return contract and reference
UDFOperandMetadata.wrap(...) and OperandTypes.VARIADIC in the comment so readers
see the actual behavior.
In `@docs/dev/ppl-functions.md`:
- Around line 131-147: The doc example uses an unqualified ADD symbol but the
register signature requires a BuiltinFunctionName enum; update the example to
call register with BuiltinFunctionName.ADD (e.g.,
register(BuiltinFunctionName.ADD, ...)) so it matches the API; ensure the
surrounding references (register, FunctionImp, builder.makeCall,
SqlStdOperatorTable.CONCAT, SqlStdOperatorTable.PLUS,
OpenSearchTypeUtil::isCharacter) remain unchanged.
- Around line 192-196: Update the docs to reflect the new UDAF registration API
by replacing references to AggBuilder::registerOperator(BuiltinFunctionName
functionName, SqlAggFunction aggFunction) with
AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler);
also ensure the subsequent bullets and any example snippets or mentions that
suggest registerOperator are changed to use AggBuilder::register and the
AggHandler/SqlAggFunction parameter names/types consistent with the PR
(including keeping the note about PPLFuncImpTable::registerExternalAggOperator
for engine-dependent functions).
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)
42-48: Javadoc documents STRING operand signatures that the type checker rejects.The Javadoc claims
(IP, STRING)and(STRING, IP)signatures, but the type checker at lines 135-136 only accepts operands whereOpenSearchTypeUtil.isIp(type, true)returns true. This method does not accept VARCHAR/CHAR types. Either update the Javadoc to reflect the actual(IP, IP)only behavior, or modify the type checker to also accept character types if STRING support is intended.
🧹 Nitpick comments (14)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)
51-53: Add JavaDoc for public method.
The public override lacks required JavaDoc with@return. As per coding guidelines, please document it.💡 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for the EXTRACT function. + * + * `@return` operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING) + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_DATETIME; }As per coding guidelines, document public methods with JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java (1)
64-73: Consider adding JavaDoc to the public static utility method.The
secToTimemethod is public and could be called directly. Per coding guidelines, public methods should have JavaDoc with@param,@return, and@throwsdocumentation. This would help clarify the expected input range and output format.📝 Suggested JavaDoc
+ /** + * Converts seconds to time string in HH:mm:ss[.nnnnnn] format. + * + * `@param` seconds the number of seconds to convert (integral or decimal) + * `@return` time string representation + */ public static String secToTime(Number seconds) {core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/DatePartFunction.java (1)
52-52: Minor typo in JavaDoc: "MIBUTEOFHOUR" should be "MINUTEOFHOUR".This appears to be a pre-existing typo, but since you're touching this file, consider fixing it.
📝 Suggested fix
- * <li>MIBUTEOFHOUR + * <li>MINUTEOFHOURcore/src/main/java/org/opensearch/sql/expression/function/udf/ToNumberFunction.java (1)
98-100: Empty catch block silently swallows parsing errors.The empty catch block suppresses all exceptions without logging or documentation, making it difficult to diagnose conversion failures. If returning
nullon parse failure is intentional behavior, consider at minimum adding a comment explaining this design choice, or logging at debug/trace level.♻️ Suggested improvement
} catch (Exception e) { - + // Intentionally return null for unparseable input (lenient parsing) + // Callers should handle null return appropriately }Or, for better observability:
+import java.util.logging.Logger; +import java.util.logging.Level; ... + private static final Logger LOGGER = Logger.getLogger(ToNumberFunction.class.getName()); ... } catch (Exception e) { - + LOGGER.log(Level.FINE, "Failed to parse number: " + numStr, e); }core/src/main/java/org/opensearch/sql/expression/function/udf/math/ConvFunction.java (2)
26-29: Pre-existing doc typo: "floor function" should be "conv function".The class-level JavaDoc appears to have been copied from another function and still references "floor function" instead of "conv function".
📝 Suggested fix
/** * Convert number x from base a to base b<br> - * The supported signature of floor function is<br> + * The supported signature of conv function is<br> * (STRING, INTEGER, INTEGER) -> STRING<br> * (INTEGER, INTEGER, INTEGER) -> STRING */
61-74: Pre-existing: JavaDoc does not match implementation behavior.The JavaDoc states the method returns
"0"for invalid input andnullfor out-of-range bases, but the actual implementation will throwNumberFormatExceptionin both cases. Consider either updating the documentation to reflect the actual exception-throwing behavior, or wrapping the logic in a try-catch to match the documented contract.📝 Option 1: Update JavaDoc to match implementation
/** * Convert numStr from fromBase to toBase * * `@param` numStr the number to convert (case-insensitive for alphanumeric digits, may have a * leading '-') * `@param` fromBase base of the input number (2 to 36) * `@param` toBase target base (2 to 36) - * `@return` the converted number in the target base (uppercase), "0" if the input is invalid, or - * null if bases are out of range. + * `@return` the converted number in the target base + * `@throws` NumberFormatException if numStr is invalid for fromBase or if bases are out of range */core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/TimestampFunction.java (1)
75-84: Consider adding JavaDoc to public static methods.The static
timestampmethods are public API entry points called via reflection. Adding JavaDoc with@paramand@returntags would improve maintainability, especially documenting the expected input types and the return value format.📝 Example JavaDoc
+ /** + * Converts a datetime value to a timestamp string. + * + * `@param` properties function properties containing timezone context + * `@param` datetime the datetime value to convert + * `@return` timestamp string representation + */ public static String timestamp(FunctionProperties properties, ExprValue datetime) { return (String) convertToTimestampValue(datetime, properties).valueForCalcite(); } + /** + * Converts a datetime value to timestamp and adds a time offset. + * + * `@param` properties function properties containing timezone context + * `@param` datetime the base datetime value + * `@param` addTime the time value to add to the base datetime + * `@return` timestamp string representation after adding the time offset + */ public static String timestamp( FunctionProperties properties, ExprValue datetime, ExprValue addTime) {core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java (1)
54-56: Consider adding JavaDoc to the publicminmethod.Per coding guidelines, public methods should have JavaDoc documentation. While this is pre-existing code, adding a brief doc comment would improve maintainability:
/** * Returns the minimum value from the given arguments using mixed-type comparison. * * `@param` args array of values to compare * `@return` the minimum value, or null if all values are null */ public static Object min(Object[] args) {core/src/main/java/org/opensearch/sql/expression/function/udf/RexExtractMultiFunction.java (1)
47-58: Add JavaDoc for the public method.
getOperandMetadatais public and should include JavaDoc (with@return, and@throwsif applicable). As per coding guidelines.📝 Suggested update
+ /** + * Returns operand metadata for REX_EXTRACT_MULTI validation. + * + * `@return` operand metadata describing accepted operands + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() {core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonArrayLengthFunctionImpl.java (1)
54-63: Consider adding null safety to theevalmethod.The
evalmethod has potential robustness issues in the existing code:
- Using
assertfor argument validation can be disabled at runtime- If
gson.fromJsonreturnsnull(e.g., for JSON literal"null"),target.size()will throw NPESince
NullPolicy.ANYindicates this function handles nulls itself, explicit null handling would be safer.♻️ Suggested improvement
public static Object eval(Object... args) { - assert args.length == 1 : "Json array length only accept one argument"; + if (args.length != 1) { + throw new IllegalArgumentException("Json array length only accepts one argument"); + } String value = (String) args[0]; + if (value == null) { + return null; + } try { List<?> target = gson.fromJson(value, List.class); - return target.size(); + return target != null ? target.size() : null; } catch (JsonSyntaxException e) { return null; } }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java (1)
44-45: Consider stricter operand count validation.Using
OperandTypes.VARIADICallows calls with 0 or 1 arguments, which theevalmethod handles by returningnull. While this is safe, requiring at least 2 arguments (json_string + path) at validation time would provide earlier feedback to users.♻️ Optional: Enforce minimum 2 operands
`@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { - return UDFOperandMetadata.wrap(OperandTypes.VARIADIC); + return UDFOperandMetadata.wrap( + OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY)); }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java (1)
39-40: Consider requiring at least 2 operands.
OperandTypes.ONE_OR_MOREallows calls with just a JSON string and no paths to delete, which is semantically meaningless. The function should require at least 2 arguments: the JSON string and at least one path to delete.♻️ Proposed fix: Enforce minimum 2 operands
`@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { - return UDFOperandMetadata.wrap(OperandTypes.ONE_OR_MORE); + return UDFOperandMetadata.wrap( + OperandTypes.repeat(SqlOperandCountRanges.from(2), OperandTypes.ANY)); }core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/UnixTimestampFunction.java (1)
51-53: Add JavaDoc for the public override.
Line 52 changes the public API signature; please add a brief JavaDoc with@returnto comply with project guidelines. As per coding guidelines, ...✍️ Suggested JavaDoc addition
`@Override` + /** + * Returns operand metadata for unix_timestamp. + * + * `@return` non-null operand metadata + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.OPTIONAL_DATE_OR_TIMESTAMP_OR_NUMERIC; }core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java (1)
60-70: Consider adding JavaDoc for this public method.The refactor to
OperandTypes.family(...).or(...)chains looks correct and aligns with the PR's goal of moving away fromCompositeOperandTypeChecker. The@NonNullannotation is a good addition.Per coding guidelines, public methods should have JavaDoc documentation explaining the operand types and their semantics. Consider documenting what each combination represents:
- CHARACTER/NUMERIC/CHARACTER: time-based span on string fields
- DATETIME/NUMERIC/CHARACTER: time-based span on datetime fields
- NUMERIC/NUMERIC/ANY: numeric span (no time unit required)
core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
Show resolved
Hide resolved
| RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory(); | ||
| RexCallBinding rexCallBinding = (RexCallBinding) sqlOperatorBinding; | ||
| List<RexNode> operands = rexCallBinding.operands(); | ||
| RelDataType lambdaReturnType = ((RexLambda) operands.get(1)).getExpression().getType(); | ||
| RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1); | ||
| return createArrayType( | ||
| typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "TransformFunctionImpl.java" | head -20Repository: opensearch-project/sql
Length of output: 166
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 10999
🏁 Script executed:
# Search for other lambda-based collection functions and how they handle return types
find . -path ./target -prune -o -type f -name "*.java" -print | xargs grep -l "getOperandType.*lambda\|lambda.*getOperandType" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 166
🏁 Script executed:
# Search for similar patterns with SqlReturnTypeInference in CollectionUDF directory
find ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF -type f -name "*.java" | head -20Repository: opensearch-project/sql
Length of output: 1529
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 3150
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 7288
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.javaRepository: opensearch-project/sql
Length of output: 996
🏁 Script executed:
# Search for TransformFunctionImpl tests
find . -path ./target -prune -o -type f -name "*TransformFunctionImplTest*" -o -name "*TransformTest*" | grep -i testRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Broader search for lambda-related tests
find ./core/src/test -type f -name "*.java" | xargs grep -l "TransformFunctionImpl\|transform.*lambda" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for SqlLambda usage and getExpression calls in the codebase
rg "SqlLambda" --type java -A 3 | head -50Repository: opensearch-project/sql
Length of output: 1907
🏁 Script executed:
# Check if there's any validation or type binding code that explains lambda operand types
rg "getOperandType.*1\)" --type java -B 2 -A 2 | head -60Repository: opensearch-project/sql
Length of output: 5563
🏁 Script executed:
# Look at how ReduceFunctionImpl and TransformFunctionImpl differ in handling binding types
grep -B 5 -A 10 "getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 624
🏁 Script executed:
# Search for any tests that use transform function
find ./core/src/test -type f -name "*.java" -o -name "*.sql" | xargs grep -l "transform" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 374
🏁 Script executed:
# Search for any documentation or comments about lambda operand types
rg "lambda.*type\|FUNCTION.*type" --type java -i | grep -i "lambda\|function" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if there are any RexCallBinding vs SqlCallBinding handling patterns for lambdas
rg "RexCallBinding\|SqlCallBinding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 2 -A 5 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for any direct usage of SqlCallBinding in the transform/lambda related code
rg "instanceof.*Binding" ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ --type java -B 3 -A 3Repository: opensearch-project/sql
Length of output: 1488
🏁 Script executed:
# Check the Calcite version used in the project
find . -name "pom.xml" -o -name "build.gradle" | xargs grep -i "calcite" | grep -i "version" | head -5Repository: opensearch-project/sql
Length of output: 293
🏁 Script executed:
# Search for any commits or comments about TransformFunctionImpl implementation
git log --oneline --all -- "./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 307
🏁 Script executed:
# Look for any integration tests or examples showing transform usage
find . -name "*.sql" -o -name "*.md" | xargs grep -l "transform" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 243
🏁 Script executed:
# Check the ppl-functions.md documentation for transform function
grep -A 10 -B 5 "transform" ./docs/user/ppl/cmd/ppl-functions.md 2>/dev/null || grep -A 10 -B 5 "transform" ./docs/dev/ppl-functions.md 2>/dev/nullRepository: opensearch-project/sql
Length of output: 634
🏁 Script executed:
# Look at the most recent commit details for TransformFunctionImpl
git show b1dca98d --stat 2>/dev/null | head -30Repository: opensearch-project/sql
Length of output: 1723
🏁 Script executed:
# Search for documentation or tests showing transform usage with different return types
find . -name "*.java" -path "*/test/*" | xargs grep -l "TransformFunctionImpl\|transform.*array" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 238
🏁 Script executed:
# Check if there are any comments or code patterns about when return type inference is called
rg "SqlReturnTypeInference\|getReturnTypeInference" ./core/src/main/java/org/opensearch/sql/expression/function/ --type java -B 2 -A 10 | head -80Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check the array function integration test
cat -n ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java | head -100Repository: opensearch-project/sql
Length of output: 3717
🏁 Script executed:
# Check the PPL array function test
cat -n ./ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java | head -150Repository: opensearch-project/sql
Length of output: 6867
🏁 Script executed:
# Search for transform-specific tests or usage
rg "transform\(" --type java -i ./integ-test ./ppl | head -20Repository: opensearch-project/sql
Length of output: 1956
🏁 Script executed:
# Check the CalciteArrayFunctionIT for transform tests
grep -n "transform" ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javaRepository: opensearch-project/sql
Length of output: 478
Extract lambda's return type explicitly for correct type inference during validation.
At the SqlCallBinding phase (validation), getOperandType(1) returns the FUNCTION type itself, not the lambda's return type. This differs from how ReduceFunctionImpl handles similar lambda-based operations. Follow the ReduceFunctionImpl pattern: distinguish between SqlCallBinding (extract SqlLambda, get expression, derive type via SqlTypeUtil.deriveType()) and RexCallBinding (use simple getOperandType() on already-compiled types). Without this, type inference may produce ARRAY<FUNCTION> instead of ARRAY<actual_return_type>.
🤖 Prompt for AI Agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`
around lines 55 - 58, TransformFunctionImpl currently uses
sqlOperatorBinding.getOperandType(1) which returns the FUNCTION type during
validation, causing incorrect ARRAY<FUNCTION> inference; update
TransformFunctionImpl to mirror ReduceFunctionImpl by distinguishing
SqlCallBinding vs RexCallBinding: when sqlOperatorBinding is an instance of
SqlCallBinding, extract the SqlLambda from the call, obtain the lambda
expression and derive its return type via SqlTypeUtil.deriveType(sqlValidator,
lambdaExpr) (or equivalent) and use that as lambdaReturnType; when it's a
RexCallBinding (already compiled), keep using getOperandType(1); update the
return type creation code that calls createArrayType(...) to use the derived
lambdaReturnType for correct ARRAY<actual_return_type>.
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
Show resolved
Hide resolved
.../src/main/java/org/opensearch/sql/expression/function/udf/binning/MinspanBucketFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java
Show resolved
Hide resolved
…mplementation - additionally udpate ppl-functions.md documentation Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/src/main/java/org/opensearch/sql/expression/function/udf/condition/EarliestFunction.java (1)
38-45: Add JavaDoc for publicgetOperandMetadata().Public methods under
core/src/main/java/**require JavaDoc with@return. Please add a brief JavaDoc (especially now that the non-null contract is explicit).✍️ Proposed JavaDoc
+ /** + * Returns operand metadata for the {`@code` earliest} function. + * + * `@return` non-null operand metadata + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_TIMESTAMP; }As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java (1)
62-64: Fix copy-paste error in exception message.The error message incorrectly refers to "Json append function" but this is
JsonExtendFunctionImpl.Suggested fix
if (keys.size() % 2 != 0) { throw new RuntimeException( - "Json append function needs corresponding path and values, but current get: " + keys); + "Json extend function needs corresponding path and values, but current get: " + keys); }core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java (1)
66-75: Add unit tests for the new default methods in UserDefinedFunctionBuilder.This interface with its new default methods—
getKind(),getOperandMetadata(),toUDF(String), andtoUDF(String, boolean)—contains new business logic incore/src/main/javathat must be covered by unit tests per coding guidelines. Current test coverage validates specific UDF implementations but does not test the default interface behavior itself.
🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`:
- Around line 62-63: Add a JavaDoc comment to the public override method
getOperandMetadata in TransformFunctionImpl describing what the method returns
and include an `@return` tag that explains the returned UDFOperandMetadata (e.g.,
metadata about operands for this UDF). Ensure the JavaDoc appears immediately
above the method signature for the public `@NonNull` UDFOperandMetadata
getOperandMetadata() override and follows project JavaDoc style.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java`:
- Around line 48-50: Add a JavaDoc comment to the public override
getOperandMetadata() in SecToTimeFunction that documents the method and its
return value; specifically, describe that it returns the operand metadata for
this UDF and include an `@return` tag stating it returns a UDFOperandMetadata
(e.g., PPLOperandTypes.NUMERIC) applicable to this function. Ensure the JavaDoc
sits immediately above the method declaration and follows project JavaDoc style.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`:
- Around line 46-49: Add a JavaDoc comment to the public method
getOperandMetadata() in ModFunction explaining what operand metadata it returns
(i.e., the expected operand types for the modulus UDF) and include a `@return` tag
that states it returns PPLOperandTypes.NUMERIC_NUMERIC; place the comment
immediately above the getOperandMetadata() declaration in ModFunction.java.
- Around line 41-49: Add a new test class ModFunctionTest.java that follows the
ToNumberFunctionTest pattern and verifies ModFunction's metadata (getKind()
returns SqlKind.MOD and getOperandMetadata() returns
PPLOperandTypes.NUMERIC_NUMERIC) and exercises the core logic by calling
integralMod() and floatingMod() with a variety of inputs: integer modulo cases,
floating-point modulo cases, mixed-type coercion (e.g., int % double and double
% int), and edge cases including divisor zero to assert the expected
exception/behavior; use the existing test utilities/assertions used by other UDF
tests to construct inputs and compare results.
In
`@core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java`:
- Around line 39-41: Add a JavaDoc comment to the public override
getOperandMetadata() in NumberToStringFunction describing what the method
returns and any exceptions it may throw; reference the method signature public
`@NonNull` UDFOperandMetadata getOperandMetadata(), state that it returns the
operand metadata (PPLOperandTypes.NUMERIC) via an `@return` tag, and include an
`@throws` tag only if the method can throw checked or runtime exceptions —
otherwise explicitly note no exceptions are thrown. Ensure the JavaDoc follows
project style and is placed immediately above the getOperandMetadata() method.
In
`@core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`:
- Around line 49-51: Add a JavaDoc comment to the public default method
getKind() in UserDefinedFunctionBuilder describing its purpose (returns the SQL
kind for this function), the return type (SqlKind, defaulting to
SqlKind.OTHER_FUNCTION), and any behavior/override expectations so consumers
know they can override it for other SqlKind values; place the comment
immediately above the getKind() declaration and follow existing project JavaDoc
style and tags (brief description and `@return`).
♻️ Duplicate comments (8)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java (1)
42-47: Class Javadoc documents unsupported STRING operand signatures.The Javadoc claims support for
(IP, STRING)and(STRING, IP)signatures, but the type checker at lines 135-136 only accepts IP types viaOpenSearchTypeUtil.isIp(type, true). ThegetAllowedSignaturesmethod correctly reports<IP>, <IP>.Either update the Javadoc to remove STRING claims, or modify the type checker to accept VARCHAR if STRING support is intended.
,core/src/main/java/org/opensearch/sql/expression/function/udf/math/EulerFunction.java (1)
42-44: Add explicit JavaDoc (or {@inheritDoc}) for the public override.This public method still lacks an explicit JavaDoc/
{@inheritdoc}block. Please add a brief doc (including@return) to meet the public-API documentation requirement. As per coding guidelines, ...core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ToSecondsFunction.java (2)
23-23: jspecify dependency concern already flagged.The import addition is appropriate for the
@NonNullannotation. The concern about verifying this dependency incore/build.gradlewas already raised in a prior review.
53-56: Nullability contract improvement looks good.The
@NonNullannotation ongetOperandMetadata()is appropriate since the method always returns a non-null constant. This aligns with the PR-wide pattern of strengthening nullability contracts across UDF operand metadata accessors.The missing JavaDoc concern was already flagged in a prior review.
docs/dev/ppl-functions.md (1)
192-195: Inconsistent UDAF registration API documentation.Lines 192-193 reference
AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)for "standard registration," but line 195 documentsAggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler). According to the AI summary, the API changed fromregisterOperatorwithSqlAggFunctiontoregisterwithAggHandler.The first bullet should be updated to reflect the new
registerAPI to avoid misleading developers.📝 Suggested fix
-- Use `AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)` for standard - registration +- Use `AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler)` for standard + registrationcore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
56-74: Missing validation for odd argument count.The function signature
(json_string, path1, value1, path2, value2, ...)requires an odd total count. With an even count like 4 arguments, theevalmethod will throwIndexOutOfBoundsExceptionat line 105 when accessingkeys.get(i + 1)for the incomplete pair. Add odd-count validation incheckOperandTypes.🐛 Proposed fix
`@Override` public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { final int count = callBinding.getOperandCount(); + // Require odd count: 1 json_string + n pairs of (path, value) + if (count % 2 == 0) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } + return false; + } for (int i = 1; i < count; i += 2) { RelDataType nameType = callBinding.getOperandType(i); if (!OpenSearchTypeUtil.isCharacter(nameType)) {core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java (1)
41-44: Add JavaDoc for the publicgetKind()override.
Public methods require JavaDoc with@return. As per coding guidelines, please document this method accordingly.core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
52-59: Derive the lambda return type instead of FUNCTION during validation.Line 56 uses
getOperandType(1), which (underSqlCallBinding) yields the lambda’s FUNCTION type, not its return type—this can inferARRAY<FUNCTION>and propagate incorrect casts. Follow theReduceFunctionImplpattern: when binding isSqlCallBinding, extract theSqlLambdaexpression and derive its return type; fall back togetOperandType(1)forRexCallBinding.🐛 Proposed fix
return sqlOperatorBinding -> { RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory(); - RelDataType lambdaReturnType = sqlOperatorBinding.getOperandType(1); + RelDataType lambdaReturnType; + if (sqlOperatorBinding instanceof SqlCallBinding) { + SqlCallBinding callBinding = (SqlCallBinding) sqlOperatorBinding; + SqlNode lambdaNode = callBinding.operand(1); + if (lambdaNode instanceof org.apache.calcite.sql.SqlLambda) { + SqlNode lambdaExpr = + ((org.apache.calcite.sql.SqlLambda) lambdaNode).getExpression(); + lambdaReturnType = SqlTypeUtil.deriveType(callBinding, lambdaExpr); + } else { + lambdaReturnType = callBinding.getOperandType(1); + } + } else { + lambdaReturnType = sqlOperatorBinding.getOperandType(1); + } return createArrayType( typeFactory, typeFactory.createTypeWithNullability(lambdaReturnType, true), true); };#!/bin/bash # Compare lambda return-type inference patterns (Transform vs Reduce) rg -n "class ReduceFunctionImpl|getReturnTypeInference|SqlLambda|deriveType" \ core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java -C 3
🧹 Nitpick comments (5)
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/FromUnixTimeFunction.java (1)
56-58: Add JavaDoc for public method per guidelines.
getOperandMetadata()is public and currently has no JavaDoc. Please add a brief JavaDoc with an@returntag. As per coding guidelines.📝 Proposed JavaDoc
+ /** + * Returns operand metadata for FROM_UNIXTIME. + * + * `@return` operand metadata describing numeric or numeric+string arguments + */ `@Override` public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.NUMERIC_OPTIONAL_STRING; }core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/PeriodNameFunction.java (1)
66-77: Remove unused variabledateType.The variable
dateTypeis computed on lines 69-71 but never used. This appears to be dead code.♻️ Suggested fix
`@Override` public Expression implement( RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) { - ExprType dateType = - OpenSearchTypeFactory.convertRelDataTypeToExprType( - call.getOperands().getFirst().getType()); return Expressions.call( PeriodNameFunctionImplementor.class, "name", translatedOperands.getFirst(), Expressions.constant(periodUnit)); }core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java (1)
136-139: Pre-existing: Floating-point modulo equality comparison may be imprecise.This is not part of your changes, but
maxValue % optimalWidth == 0may not behave as expected due to floating-point precision. For example,10.0 % 0.1may not equal exactly0.0. Consider using a tolerance-based comparison if this causes issues.♻️ Optional fix
- if (maxValue % optimalWidth == 0) { + double remainder = maxValue % optimalWidth; + if (Math.abs(remainder) < 1e-10 || Math.abs(remainder - optimalWidth) < 1e-10) {core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/ExtractFunction.java (1)
46-53: Add JavaDoc@returnfor the non-null contract.
Public methods in this module should document their return contract; the new non-null guarantee should be explicitly stated.📝 Suggested JavaDoc
`@Override` + /** + * Returns operand metadata for the EXTRACT function. + * + * `@return` non-null operand metadata for (STRING, DATE/TIME/TIMESTAMP/STRING) + */ public `@NonNull` UDFOperandMetadata getOperandMetadata() { return PPLOperandTypes.STRING_DATETIME; }As per coding guidelines, ...
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java (1)
47-78: Add a defensive arity check before accessing operand(2).If an invalid call slips through, this currently risks an
IndexOutOfBoundsExceptioninstead of a clearer validation error. A small guard improves diagnostics.♻️ Proposed fix
if (sqlOperatorBinding instanceof RexCallBinding) { return sqlOperatorBinding.getOperandType(sqlOperatorBinding.getOperandCount() - 1); } else if (sqlOperatorBinding instanceof SqlCallBinding callBinding) { + if (callBinding.getOperandCount() < 3) { + throw new IllegalArgumentException( + "reduce requires at least 3 operands (array, base, lambda)"); + } RelDataType elementType = callBinding.getOperandType(0).getComponentType(); RelDataType baseType = callBinding.getOperandType(1); SqlLambda reduce1 = callBinding.getCall().operand(2);
...rc/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/datetime/SecToTimeFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/udf/math/NumberToStringFunction.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java
Show resolved
Hide resolved
LantaoJin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't completed all code review, many unexpected scripts was introduced in Clickbench. Please fix them then go.
| synchronized (this) { | ||
| // Double-Checked Locking for thread-safety | ||
| if (validator == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why thread problem could be happened here? the CalcitePlanContext is created per query. Do you want create a SqlValidator per query either? Or share a SqlValidator in diff queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to think a validator could be re-used by multiple queries. But if a new context is created for every query, there is no need to do so. Also, I later found out that SqlValidator is stateful and could not be shared. I have updated the code to remove it.
| /** Cached SqlValidator instance (lazy initialized). */ | ||
| private volatile SqlValidator validator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache for what? do you mean some queries could be validated-free? Since CalcitePlanContext is create per query, if a query has to be validated no matter what it is, lazy initialization is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| defaultSchema != null ? CalciteSchema.from(defaultSchema) : prepareContext.getRootSchema(); | ||
| CalciteCatalogReader catalogReader = | ||
| new CalciteCatalogReader( | ||
| schema.root(), schema.path(null), typeFactory, prepareContext.config()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use CalciteConnectionConfig.DEFAULT as the last parameter of CalciteCatalogReader(), is it possible to remove the CalciteServerStatement in create's parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible. Updated the implementation to eliminate the dependency.
| * Embed existing collation into window function's over clauses. | ||
| * | ||
| * <p>Window functions with frame specifications like {@code ROWS n PRECEDING} require ORDER BY to | ||
| * determine row order. Without it, results are non-deterministic. | ||
| * | ||
| * <p>Without this fix, the initial plan has ORDER BY separate from window functions: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ROWS 1 PRECEDING)) ← Missing ORDER BY | ||
| * LogicalSort(sort0=[$5]) | ||
| * </pre> | ||
| * | ||
| * <p>This causes problems during validation as the order is not bound to the window. With this | ||
| * fix, sort collations are embeded into each {@code RexOver} window: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ORDER BY $5 ROWS 1 PRECEDING)) ← ORDER BY embedded | ||
| * </pre> | ||
| * | ||
| * @param overExpressions Window function expressions (may contain nested {@link RexOver}) | ||
| * @param context Plan context for building RexNodes | ||
| * @return Expressions with ORDER BY embedded in all window specifications | ||
| */ | ||
| private List<RexNode> embedExistingCollationsIntoOver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this changes fix the issue #5004?
| if (nullCount == 0 && anyCount == 0) { | ||
| RelDataType udt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why return udt only when nullCount == 0 && anyCount == 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the condition to anyCount == 0 alone.
If there is ANY involved, the common type will be ANY anyway, I delegate this to calcite's implementation of leastRestrictive. Since SqlTypeName.NULL can be casted to any possible type (not ANY type) including UDTs, I include it now.
| LogicalFilter(condition=[SEARCH($28, Sarg[[20..200]])]) | ||
| LogicalSort(fetch=[10]) | ||
| LogicalProject(agent=[$0], process=[$6], log=[$8], message=[$11], tags=[$12], cloud=[$13], input=[$15], @timestamp=[$17], ecs=[$18], data_stream=[$20], meta=[$24], host=[$26], metrics=[$27], aws=[$30], event=[$35]) | ||
| LogicalFilter(condition=[AND(>=($28, 20), <=($28, 200))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just curious that how to generate SEARCH expression from SqlNode
| LogicalFilter(condition=[<>(SAFE_CAST($19), 0)]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, hits]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[FILTER-><>($0, 0), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={},count()=COUNT()), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"exists":{"field":"AdvEngineID","boost":1.0}}],"must_not":[{"term":{"AdvEngineID":{"value":0,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"track_total_hits":2147483647}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[SCRIPT-><>(SAFE_CAST($0), 0), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={},count()=COUNT()), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQCSnsKICAib3AiOiB7CiAgICAibmFtZSI6ICI8PiIsCiAgICAia2luZCI6ICJOT1RfRVFVQUxTIiwKICAgICJzeW50YXgiOiAiQklOQVJZIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiU0FGRV9DQVNUIiwKICAgICAgICAia2luZCI6ICJTQUZFX0NBU1QiLAogICAgICAgICJzeW50YXgiOiAiU1BFQ0lBTCIKICAgICAgfSwKICAgICAgIm9wZXJhbmRzIjogWwogICAgICAgIHsKICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICJ0eXBlIjogIlNNQUxMSU5UIiwKICAgICAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICAgICAgfQogICAgICAgIH0KICAgICAgXSwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9LAogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMSwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgICAibnVsbGFibGUiOiB0cnVlCiAgICAgIH0KICAgIH0KICBdCn0=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["AdvEngineID",0]}},"boost":1.0}},"track_total_hits":2147483647}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems an unexpected script was introduced here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script is introduced by SAFE_CAST(AdvEngineID AS INTEGER).
I am thinking maybe this cast here does make sense. The PPL query is source=hits | where AdvEngineID!=0 | stats count(), where AdvEngineID is a smallint, and 0 is parsed as an integer. Therefore, AdvEngineID is promoted to their common type integer.
The cast seems to be unnecessary in 2 ways:
- why use
SAFE_CASTinstead ofCAST, which won't introduce scripts. - why cast
AdvEngineIDto integer instead of casting0tosmallint
After some investigation, there are the reasons:
- There is a PR that introduces SAFE_CAST so that malformatted data won't cause runtime errors but return null. It enables the following use case (especially the last column)
sql/docs/user/ppl/functions/conversion.md
Lines 131 to 148 in 42b89f0
| ### Example: Use string in comparison operator | |
| ```ppl | |
| source=people | |
| | eval e="1000"==1000, en="1000"!=1000, ed="1000"==1000.0, edn="1000"!=1000.0, l="1000">999, ld="1000">999.9, i="malformed"==1000 | |
| | fields e, en, ed, edn, l, ld, i | |
| ``` | |
| Expected output: | |
| ```text | |
| fetched rows / total rows = 1/1 | |
| +------+-------+------+-------+------+------+------+ | |
| | e | en | ed | edn | l | ld | i | | |
| |------+-------+------+-------+------+------+------| | |
| | True | False | True | False | True | True | null | | |
| +------+-------+------+-------+------+------+------+ | |
| ``` |
- I can of course always cast the literal side, i.e. cast 0 to smallint in this case, but what if it is not
0but Integer.MAX? CastingAdvEngineIDto integer would be safer.
What's your opinion on how to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your opinion on how to fix it?
No problem to adding SAFE_CAST(AdvEngineID AS INTEGER) in the original logical plan. But we need to remove the cast in pushdown IMO.
Refer to SparkSQL
create table t1(col1 ShortType) using parquet;
explain extended select * from t1 where col1 = 0;
In its logical plan, we see the cast(col1#5 as int)
== Analyzed Logical Plan ==
col1: smallint
Project [col1#5]
+- Filter (cast(col1#5 as int) = 0)
+- SubqueryAlias spark_catalog.default.t1
+- Relation spark_catalog.default.t1[col1#5] parquet
But the cast removed in optimized logical plan
== Optimized Logical Plan ==
Filter (isnotnull(col1#5) AND (col1#5 = 0))
+- Relation spark_catalog.default.t1[col1#5] parquet
and its physical PushedFilters
== Physical Plan ==
*(1) Filter (isnotnull(col1#5) AND (col1#5 = 0))
+- *(1) ColumnarToRow
+- FileScan parquet spark_catalog.default.t1[col1#5] Batched: true, DataFilters: [isnotnull(col1#5), (col1#5 = 0)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/ltjin/Package/spark-3.5.4-bin-hadoop3/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [IsNotNull(col1), EqualTo(col1,0)], ReadSchema: struct<col1:smallint>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: we need a UnwrapCastInBinaryComparison like rule in optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai please open an issue about implementing a UnwrapCastInBinaryComparison like rule in optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
TITLE="Optimizer: Unwrap CAST/SAFE_CAST in binary comparisons to enable filter pushdown (Spark parity)"
BODY_FILE="$(mktemp)"
cat > "$BODY_FILE" << 'EOF'
Summary
Implement an optimizer rule equivalent to Spark SQL’s “UnwrapCastInBinaryComparison” to remove redundant CAST/SAFE_CAST on the column side of binary comparisons. This allows predicates to be pushed down as normal FILTERs instead of SCRIPTs.
Motivation / Problem
During Calcite validation, mismatched types can introduce SAFE_CAST on the attribute side (e.g., SMALLINT column vs INTEGER literal). This causes SCRIPT-based pushdown and prevents efficient filter pushdown.
Concrete example
- Work item surfaced in PR `#4892`.
- ClickBench q2: AdvEngineID (SMALLINT) compared to 0 (INTEGER) ends up as SAFE_CAST(AdvEngineID) <> 0 which becomes a SCRIPT in pushdown rather than a FILTER.
Desired behavior (high-level)
- When a binary comparison has CAST/SAFE_CAST on the column side and a foldable literal on the other side, unwrap the cast from the column and, if needed, cast the literal to the column’s type.
- Preserve null semantics appropriately (e.g., add isNotNull(col) where required by the comparison semantics).
- Result: comparisons like SAFE_CAST(col) <> 0 become col <> SMALLINT(0), enabling FILTER pushdown (no script).
Scope (initial)
- Operators: =, <>, <, <=, >, >=, BETWEEN, IN (scalar/tuple), and relevant IP variants where applicable.
- Only when one side is CAST/SAFE_CAST of an attribute (or a simple expression derived from it) and the other side is a literal/foldable.
- Do not change behavior in cases that would alter correctness (e.g., overflow/precision edge cases without a safe equivalent).
References
- Spark optimizer rule: UnwrapCastInBinaryComparison
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
Backlinks
- PR: https://github.com/opensearch-project/sql/pull/4892
- Comment: https://github.com/opensearch-project/sql/pull/4892#discussion_r2730526656
Acceptance criteria
- Predicates that currently generate SCRIPT pushdown solely due to CAST/SAFE_CAST on the column side are rewritten to enable FILTER pushdown.
- No regressions in correctness; add integration coverage (e.g., ClickBench q2) demonstrating the switch from SCRIPT to FILTER pushdown.
- Document the rule and any corner cases (null handling, overflow behavior) in developer docs.
Requested by
`@LantaoJin`
EOF
# Create the issue in the current repo context and assign to the requester
gh issue create --title "$TITLE" --body-file "$BODY_FILE" --assignee "LantaoJin"
Length of output: 53
@LantaoJin done — opened the issue and assigned it to you:
Details:
- Title: Optimizer: Unwrap CAST/SAFE_CAST in binary comparisons to enable filter pushdown (Spark parity)
- Includes summary, motivation, Spark reference, scope, acceptance criteria, and backlinks to this PR and your comment.
- Example referenced: ClickBench q2 (SAFE_CAST(AdvEngineID) <> 0 leading to SCRIPT pushdown).
Want me to add labels (e.g., optimizer, enhancement, calcite) or assign it to someone else?
🧠 Learnings used
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2026-01-26T22:19:54.446Z
Learning: Applies to **/calcite/**/*Test.java : Test SQL generation and optimization paths in Calcite integration
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some investigation, there are the reasons:
- There is a PR that introduces SAFE_CAST so that malformatted data won't cause runtime errors but return null. It enables the following use case (especially the last column)
@yuancu, back to your investigation, how a runtime error could be happened? where a != 0 implicit that the a is not null. Since if the a is null, it should return false in database. So, we can optimize the a!=0 to isnotnull(a) AND NOT(a = 0), a=0 to isnotnull(a) AND a = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example on PostgreSQL:
-- create
CREATE TABLE EMPLOYEE (
empId INTEGER PRIMARY KEY,
name TEXT NOT NULL,
dept TEXT NOT NULL,
col1 SMALLINT
);
-- insert
INSERT INTO EMPLOYEE VALUES (0001, 'Clark', 'Sales', 1);
INSERT INTO EMPLOYEE VALUES (0002, 'Dave', 'Accounting', 0);
INSERT INTO EMPLOYEE VALUES (0003, 'Ava', 'Sales');
-- fetch
SELECT * FROM EMPLOYEE WHERE col1 != 0;
returns
empid | name | dept | col1
-------+-------+-------+------
1 | Clark | Sales | 1
and
SELECT * FROM EMPLOYEE WHERE col1 = 0;
returns
empid | name | dept | col1
-------+------+------------+------
2 | Dave | Accounting | 0
| LogicalFilter(condition=[AND(=($103, 62), >=($0, TIMESTAMP('2013-07-01 00:00:00')), <=($0, TIMESTAMP('2013-07-31 00:00:00')), =(SAFE_CAST($42), 0), =(SAFE_CAST($72), 0), <>($26, ''))]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, hits]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[FILTER->AND(=($4, 62), SEARCH($0, Sarg[['2013-07-01 00:00:00':VARCHAR..'2013-07-31 00:00:00':VARCHAR]]:VARCHAR), =($2, 0), =($3, 0), <>($1, '')), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={1},PageViews=COUNT()), SORT_AGG_METRICS->[1 DESC LAST], PROJECT->[PageViews, URL], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"term":{"CounterID":{"value":62,"boost":1.0}}},{"range":{"EventDate":{"from":"2013-07-01T00:00:00.000Z","to":"2013-07-31T00:00:00.000Z","include_lower":true,"include_upper":true,"format":"date_time","boost":1.0}}},{"term":{"DontCountHits":{"value":0,"boost":1.0}}},{"term":{"IsRefresh":{"value":0,"boost":1.0}}},{"bool":{"must":[{"exists":{"field":"URL","boost":1.0}}],"must_not":[{"term":{"URL":{"value":"","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"URL":{"terms":{"field":"URL","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[SCRIPT->AND(=($4, 62), SEARCH($0, Sarg[['2013-07-01 00:00:00':EXPR_TIMESTAMP VARCHAR..'2013-07-31 00:00:00':EXPR_TIMESTAMP VARCHAR]]:EXPR_TIMESTAMP VARCHAR), =(SAFE_CAST($2), 0), =(SAFE_CAST($3), 0), <>($1, '')), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={1},PageViews=COUNT()), SORT_AGG_METRICS->[1 DESC LAST], PROJECT->[PageViews, URL], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"term":{"CounterID":{"value":62,"boost":1.0}}},{"range":{"EventDate":{"from":"2013-07-01T00:00:00.000Z","to":"2013-07-31T00:00:00.000Z","include_lower":true,"include_upper":true,"format":"date_time","boost":1.0}}},{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQCRXsKICAib3AiOiB7CiAgICAibmFtZSI6ICI9IiwKICAgICJraW5kIjogIkVRVUFMUyIsCiAgICAic3ludGF4IjogIkJJTkFSWSIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIlNBRkVfQ0FTVCIsCiAgICAgICAgImtpbmQiOiAiU0FGRV9DQVNUIiwKICAgICAgICAic3ludGF4IjogIlNQRUNJQUwiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJTTUFMTElOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0sCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIklOVEVHRVIiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["DontCountHits",0]}},"boost":1.0}},{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQCRXsKICAib3AiOiB7CiAgICAibmFtZSI6ICI9IiwKICAgICJraW5kIjogIkVRVUFMUyIsCiAgICAic3ludGF4IjogIkJJTkFSWSIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIlNBRkVfQ0FTVCIsCiAgICAgICAgImtpbmQiOiAiU0FGRV9DQVNUIiwKICAgICAgICAic3ludGF4IjogIlNQRUNJQUwiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJTTUFMTElOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0sCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIklOVEVHRVIiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["IsRefresh",0]}},"boost":1.0}},{"bool":{"must":[{"exists":{"field":"URL","boost":1.0}}],"must_not":[{"term":{"URL":{"value":"","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"URL":{"terms":{"field":"URL","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| CalciteLogicalIndexScan(table=[[OpenSearch, hits]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[FILTER->AND(=($4, 62), SEARCH($0, Sarg[['2013-07-01 00:00:00':VARCHAR..'2013-07-31 00:00:00':VARCHAR]]:VARCHAR), =($1, 0), =($2, 0), <>($3, '')), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={3},PageViews=COUNT()), SORT_AGG_METRICS->[1 DESC LAST], PROJECT->[PageViews, Title], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"term":{"CounterID":{"value":62,"boost":1.0}}},{"range":{"EventDate":{"from":"2013-07-01T00:00:00.000Z","to":"2013-07-31T00:00:00.000Z","include_lower":true,"include_upper":true,"format":"date_time","boost":1.0}}},{"term":{"DontCountHits":{"value":0,"boost":1.0}}},{"term":{"IsRefresh":{"value":0,"boost":1.0}}},{"bool":{"must":[{"exists":{"field":"Title","boost":1.0}}],"must_not":[{"term":{"Title":{"value":"","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"Title":{"terms":{"field":"Title","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[SCRIPT->AND(=($4, 62), SEARCH($0, Sarg[['2013-07-01 00:00:00':EXPR_TIMESTAMP VARCHAR..'2013-07-31 00:00:00':EXPR_TIMESTAMP VARCHAR]]:EXPR_TIMESTAMP VARCHAR), =(SAFE_CAST($1), 0), =(SAFE_CAST($2), 0), <>($3, '')), AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={3},PageViews=COUNT()), SORT_AGG_METRICS->[1 DESC LAST], PROJECT->[PageViews, Title], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","query":{"bool":{"must":[{"term":{"CounterID":{"value":62,"boost":1.0}}},{"range":{"EventDate":{"from":"2013-07-01T00:00:00.000Z","to":"2013-07-31T00:00:00.000Z","include_lower":true,"include_upper":true,"format":"date_time","boost":1.0}}},{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQCRXsKICAib3AiOiB7CiAgICAibmFtZSI6ICI9IiwKICAgICJraW5kIjogIkVRVUFMUyIsCiAgICAic3ludGF4IjogIkJJTkFSWSIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIlNBRkVfQ0FTVCIsCiAgICAgICAgImtpbmQiOiAiU0FGRV9DQVNUIiwKICAgICAgICAic3ludGF4IjogIlNQRUNJQUwiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJTTUFMTElOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0sCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIklOVEVHRVIiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["DontCountHits",0]}},"boost":1.0}},{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQCRXsKICAib3AiOiB7CiAgICAibmFtZSI6ICI9IiwKICAgICJraW5kIjogIkVRVUFMUyIsCiAgICAic3ludGF4IjogIkJJTkFSWSIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIlNBRkVfQ0FTVCIsCiAgICAgICAgImtpbmQiOiAiU0FGRV9DQVNUIiwKICAgICAgICAic3ludGF4IjogIlNQRUNJQUwiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJTTUFMTElOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0sCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIklOVEVHRVIiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["IsRefresh",0]}},"boost":1.0}},{"bool":{"must":[{"exists":{"field":"Title","boost":1.0}}],"must_not":[{"term":{"Title":{"value":"","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"Title":{"terms":{"field":"Title","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| LogicalFilter(condition=[AND(=($103, 62), >=($0, TIMESTAMP('2013-07-01 00:00:00':VARCHAR)), <=($0, TIMESTAMP('2013-07-31 00:00:00':VARCHAR)), =($72, 0), <>($49, 0), =($35, 0))]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, hits]]) | ||
| LogicalProject(EventDate=[$0], URLRegionID=[$1], HasGCLID=[$2], Income=[$3], Interests=[$4], Robotness=[$5], BrowserLanguage=[$6], CounterClass=[$7], BrowserCountry=[$8], OriginalURL=[$9], ClientTimeZone=[$10], RefererHash=[$11], TraficSourceID=[$12], HitColor=[$13], RefererRegionID=[$14], URLCategoryID=[$15], LocalEventTime=[$16], EventTime=[$17], UTMTerm=[$18], AdvEngineID=[$19], UserAgentMinor=[$20], UserAgentMajor=[$21], RemoteIP=[$22], Sex=[$23], JavaEnable=[$24], URLHash=[$25], URL=[$26], ParamOrderID=[$27], OpenstatSourceID=[$28], HTTPError=[$29], SilverlightVersion3=[$30], MobilePhoneModel=[$31], SilverlightVersion4=[$32], SilverlightVersion1=[$33], SilverlightVersion2=[$34], IsDownload=[$35], IsParameter=[$36], CLID=[$37], FlashMajor=[$38], FlashMinor=[$39], UTMMedium=[$40], WatchID=[$41], DontCountHits=[$42], CookieEnable=[$43], HID=[$44], SocialAction=[$45], WindowName=[$46], ConnectTiming=[$47], PageCharset=[$48], IsLink=[$49], IsArtifical=[$50], JavascriptEnable=[$51], ClientEventTime=[$52], DNSTiming=[$53], CodeVersion=[$54], ResponseEndTiming=[$55], FUniqID=[$56], WindowClientHeight=[$57], OpenstatServiceName=[$58], UTMContent=[$59], HistoryLength=[$60], IsOldCounter=[$61], MobilePhone=[$62], SearchPhrase=[$63], FlashMinor2=[$64], SearchEngineID=[$65], IsEvent=[$66], UTMSource=[$67], RegionID=[$68], OpenstatAdID=[$69], UTMCampaign=[$70], GoodEvent=[$71], IsRefresh=[$72], ParamCurrency=[$73], Params=[$74], ResolutionHeight=[$75], ClientIP=[$76], FromTag=[$77], ParamCurrencyID=[$78], ResponseStartTiming=[$79], ResolutionWidth=[$80], SendTiming=[$81], RefererCategoryID=[$82], OpenstatCampaignID=[$83], UserID=[$84], WithHash=[$85], UserAgent=[$86], ParamPrice=[$87], ResolutionDepth=[$88], IsMobile=[$89], Age=[$90], SocialSourceNetworkID=[$91], OpenerName=[$92], OS=[$93], IsNotBounce=[$94], Referer=[$95], NetMinor=[$96], Title=[$97], NetMajor=[$98], IPNetworkID=[$99], FetchTiming=[$100], SocialNetwork=[$101], SocialSourcePage=[$102], CounterID=[$103], WindowClientWidth=[$104], _id=[$105], _index=[$106], _score=[$107], _maxscore=[$108], _sort=[$109], _routing=[$110]) | ||
| LogicalFilter(condition=[AND(=($103, 62), >=($0, TIMESTAMP('2013-07-01 00:00:00')), <=($0, TIMESTAMP('2013-07-31 00:00:00')), =(SAFE_CAST($72), 0), <>(SAFE_CAST($49), 0), =(SAFE_CAST($35), 0))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…mance Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 1780-1803: Add unit tests exercising the new
embedExistingCollationsIntoOver logic in CalciteRelNodeVisitor by creating or
extending CalciteRexNodeVisitorTest (or a new test class) to cover: window
functions that already have ORDER BY, window functions lacking ORDER BY so that
embedExistingCollationsIntoOver injects collations into RexOver instances,
windows with ROWS/RANGE frame specifications, and queries with multiple window
functions; assert the produced RexOver or plan string contains the expected
ORDER BY clauses and that frame behavior remains unchanged, referencing the
method embedExistingCollationsIntoOver, the CalciteRelNodeVisitor class and
RexOver nodes to locate where to construct and validate test inputs.
In `@core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java`:
- Around line 58-70: Update the Javadoc for the public static
PplValidator.create method to match its actual parameters: remove the outdated
`@param` statement entry and add `@param` tags for frameworkConfig, operatorTable,
typeFactory, and validatorConfig (briefly describe each), and ensure there is an
`@return` tag describing the configured PplValidator instance; keep any `@throws`
tags only if the method actually throws exceptions. Reference:
PplValidator.create(FrameworkConfig, SqlOperatorTable, RelDataTypeFactory,
SqlValidator.Config).
- Around line 50-119: Add unit tests exercising PplValidator.deriveType and
getValidatedNodeTypeIfKnown plus the helper conversion methods
userDefinedTypeToSqlType and sqlTypeToUserDefinedType: cover primitive UDTs
(DATE, TIME, TIMESTAMP, BINARY), nested types (ARRAY, MAP, MULTISET, STRUCT),
and nullability preservation across round-trips. Also address the IP type
asymmetry by updating sqlTypeToUserDefinedType to recognize the SQL-side
NULLABLE_IP_UDT and map it back to the IP UDT (preserving nullable flag) or add
tests that assert the asymmetry if intentional; place all tests in the same
commit per guidelines.
- Around line 136-153: In PplValidator.userDefinedTypeToSqlType the IP branch
currently returns UserDefinedFunctionUtils.NULLABLE_IP_UDT (a UDT) which
violates the method's contract of converting UDTs to SQL types; change the
ExprCoreType.IP branch to use the OpenSearchTypeFactory to create a
SqlTypeName.OTHER SQL type with the same nullability (i.e., call
typeFactory.createSqlType(SqlTypeName.OTHER, t.isNullable())) so IP is
represented as SqlTypeName.OTHER during validation instead of a UDT.
| /** | ||
| * Embed existing collation into window function's over clauses. | ||
| * | ||
| * <p>Window functions with frame specifications like {@code ROWS n PRECEDING} require ORDER BY to | ||
| * determine row order. Without it, results are non-deterministic. | ||
| * | ||
| * <p>Without this fix, the initial plan has ORDER BY separate from window functions: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ROWS 1 PRECEDING)) ← Missing ORDER BY | ||
| * LogicalSort(sort0=[$5]) | ||
| * </pre> | ||
| * | ||
| * <p>This causes problems during validation as the order is not bound to the window. With this | ||
| * fix, sort collations are embeded into each {@code RexOver} window: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ORDER BY $5 ROWS 1 PRECEDING)) ← ORDER BY embedded | ||
| * </pre> | ||
| * | ||
| * @param overExpressions Window function expressions (may contain nested {@link RexOver}) | ||
| * @param context Plan context for building RexNodes | ||
| * @return Expressions with ORDER BY embedded in all window specifications | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for existing Calcite window/collation/RexOver tests
rg -n --type=java "RexOver|window.*function|collation|StreamWindow" core/src/test --max-count=20Repository: opensearch-project/sql
Length of output: 1951
🏁 Script executed:
# Search more broadly for test files related to Calcite
find core/src/test -name "*Calcite*Test.java" -o -name "*Window*Test.java" | head -20Repository: opensearch-project/sql
Length of output: 1377
🏁 Script executed:
# Look for tests that might cover the embedding logic
rg -n "embed|collation" core/src/test --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check CalciteRelNodeVisitorSearchSimpleTest.java for content
cat core/src/test/java/org/opensearch/sql/calcite/CalciteRelNodeVisitorSearchSimpleTest.java | head -100Repository: opensearch-project/sql
Length of output: 3043
🏁 Script executed:
# Look for any test methods related to window functions or collation in Calcite test files
rg -n "void test|public void" core/src/test/java/org/opensearch/sql/calcite/ --type=java | grep -i "window\|collation\|over\|rexover" -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for the actual implementation: look at CalciteRelNodeVisitor around lines 1780-1803
sed -n '1770,1810p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 1868
🏁 Script executed:
# Search for any existing tests that might cover the window/over logic in Calcite tests
rg -n "OVER|RexOver" core/src/test/java/org/opensearch/sql/calcite/ --type=java -A 2Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for tests that explicitly test embedExistingCollationsIntoOver
rg -n "embedExistingCollationsIntoOver" . --type=javaRepository: opensearch-project/sql
Length of output: 363
🏁 Script executed:
# Check if there are any Calcite-specific tests that cover window with collations
rg -n "window|collation|RexOver" core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.java --type=java -A 3 | head -50Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check the full test file to understand what's covered
wc -l core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.javaRepository: opensearch-project/sql
Length of output: 146
🏁 Script executed:
# Look for any test that mentions window or over in method names
rg -n "@Test\s*\n\s*(public\s+)?void\s+\w*[Ww]indow|@Test\s*\n\s*(public\s+)?void\s+\w*[Oo]ver" core/src/test/java/org/opensearch/sql/calcite/ --type=javaRepository: opensearch-project/sql
Length of output: 257
🏁 Script executed:
# Read CalciteRexNodeVisitorTest.java to see what's tested
cat core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.javaRepository: opensearch-project/sql
Length of output: 8794
🏁 Script executed:
# Search more carefully for any window tests that might be related
rg -n "test.*[Ww]indow" core/src/test/java/org/opensearch/sql/calcite/ --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if there are integration tests that test window functions with Calcite
find core/src/test -name "*IntegrationTest*.java" -o -name "*IT.java" | xargs grep -l "OVER\|window" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
Add Calcite integration tests for window function collation embedding.
The new embedExistingCollationsIntoOver method (lines 1780–1803) handles critical window function semantics by injecting ORDER BY into OVER clauses. Per coding guidelines, all new business logic requires unit tests in the same commit. Add tests in the Calcite test suite (e.g., CalciteRexNodeVisitorTest.java or a new dedicated test class) covering:
- Window functions with explicit ORDER BY
- Window functions without explicit ORDER BY (triggering collation embedding)
- Window functions with frame specifications (ROWS/RANGE clauses)
- Multiple window functions in the same query
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 1780 - 1803, Add unit tests exercising the new
embedExistingCollationsIntoOver logic in CalciteRelNodeVisitor by creating or
extending CalciteRexNodeVisitorTest (or a new test class) to cover: window
functions that already have ORDER BY, window functions lacking ORDER BY so that
embedExistingCollationsIntoOver injects collations into RexOver instances,
windows with ROWS/RANGE frame specifications, and queries with multiple window
functions; assert the produced RexOver or plan string contains the expected
ORDER BY clauses and that frame behavior remains unchanged, referencing the
method embedExistingCollationsIntoOver, the CalciteRelNodeVisitor class and
RexOver nodes to locate where to construct and validate test inputs.
| public class PplValidator extends SqlValidatorImpl { | ||
| /** | ||
| * Tracks whether the current deriveType call is at the top level (true) or a recursive call | ||
| * (false). Top-level calls return user-defined types, while recursive calls return SQL types for | ||
| * internal validation. | ||
| */ | ||
| private boolean top; | ||
|
|
||
| /** | ||
| * Creates a SqlValidator configured for PPL validation. | ||
| * | ||
| * @param statement Calcite server statement | ||
| * @param frameworkConfig Framework configuration | ||
| * @param operatorTable SQL operator table to use for validation | ||
| * @return configured SqlValidator instance | ||
| */ | ||
| public static PplValidator create( | ||
| FrameworkConfig frameworkConfig, | ||
| SqlOperatorTable operatorTable, | ||
| RelDataTypeFactory typeFactory, | ||
| SqlValidator.Config validatorConfig) { | ||
| SchemaPlus defaultSchema = | ||
| Objects.requireNonNull(frameworkConfig.getDefaultSchema(), "defaultSchema"); | ||
|
|
||
| final CalciteSchema schema = CalciteSchema.from(defaultSchema); | ||
| CalciteCatalogReader catalogReader = | ||
| new CalciteCatalogReader( | ||
| schema.root(), schema.path(null), typeFactory, CalciteConnectionConfig.DEFAULT); | ||
| return new PplValidator(operatorTable, catalogReader, typeFactory, validatorConfig); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a PPL validator. | ||
| * | ||
| * @param opTab Operator table containing PPL operators | ||
| * @param catalogReader Catalog reader for accessing schema information | ||
| * @param typeFactory Type factory for creating type information | ||
| * @param config Validator configuration | ||
| */ | ||
| protected PplValidator( | ||
| SqlOperatorTable opTab, | ||
| SqlValidatorCatalogReader catalogReader, | ||
| RelDataTypeFactory typeFactory, | ||
| Config config) { | ||
| super(opTab, catalogReader, typeFactory, config); | ||
| top = true; | ||
| } | ||
|
|
||
| /** | ||
| * Overrides the deriveType method to map user-defined types (UDTs) to SqlTypes so that they can | ||
| * be validated | ||
| */ | ||
| @Override | ||
| public RelDataType deriveType(SqlValidatorScope scope, SqlNode expr) { | ||
| // The type has to be sql type during type derivation & validation | ||
| boolean original = top; | ||
| top = false; | ||
| RelDataType type = super.deriveType(scope, expr); | ||
| top = original; | ||
| if (top) { | ||
| return sqlTypeToUserDefinedType(type); | ||
| } | ||
| return userDefinedTypeToSqlType(type); | ||
| } | ||
|
|
||
| @Override | ||
| public @Nullable RelDataType getValidatedNodeTypeIfKnown(SqlNode node) { | ||
| RelDataType type = super.getValidatedNodeTypeIfKnown(node); | ||
| return sqlTypeToUserDefinedType(type); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find PplValidator test files and existing validator tests
fd -e java -p "Test.java" -x rg -l "PplValidator" {}Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Look for type conversion and UDT-related tests
rg -n --type=java "sqlTypeToUserDefinedType|userDefinedTypeToSqlType|UDT" -g "*Test.java"Repository: opensearch-project/sql
Length of output: 10758
🏁 Script executed:
# Search for implementations of the type conversion methods in main code
rg -n --type=java "sqlTypeToUserDefinedType|userDefinedTypeToSqlType" --glob="**/*.java" -A 3Repository: opensearch-project/sql
Length of output: 1865
🏁 Script executed:
# Find the PplValidator class to see full context
fd -type f -name "PplValidator.java" -not -path "*/Test.java"Repository: opensearch-project/sql
Length of output: 236
🏁 Script executed:
# Search for any test files that might test PplValidator
find . -name "*Test.java" -o -name "*IT.java" | xargs grep -l "PplValidator" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
# Get the full PplValidator.java file to see complete implementations
cat -n core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 9729
🏁 Script executed:
# Check if there's a test directory for calcite validators
fd -type f -name "*.java" "core/src/test" | grep -i validatorRepository: opensearch-project/sql
Length of output: 236
🏁 Script executed:
# Search more carefully for tests mentioning PplValidator
rg -l "PplValidator" --type javaRepository: opensearch-project/sql
Length of output: 208
🏁 Script executed:
# Check if PplValidator is mentioned in any tests at all
rg "PplValidator" core/src/test -r --type javaRepository: opensearch-project/sql
Length of output: 110
🏁 Script executed:
# Check git log for when PplValidator was added/modified
git log --oneline -20 core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 149
🏁 Script executed:
# Check git log more thoroughly with -p to see what changed in recent commits
git log --oneline -50 core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 149
🏁 Script executed:
# Check when CalcitePlanContext started using PplValidator
git log --oneline core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java | head -20Repository: opensearch-project/sql
Length of output: 149
🏁 Script executed:
# Check if deriveType override is recent by examining blame
git blame -L 103,113 core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 1057
🏁 Script executed:
# Look for validator integration tests
find core/src/test -type f -name "*.java" -exec grep -l "validate\|Validator" {} \; | head -10Repository: opensearch-project/sql
Length of output: 458
🏁 Script executed:
# Verify IP type handling - check if there's a reverse mapping for IP
rg "VARBINARY|IP.*Sql" core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if ValidationUtils.createUDTWithAttributes handles IP
cat core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.javaRepository: opensearch-project/sql
Length of output: 7184
Add unit tests for PplValidator type conversion and verify IP type round-trip mapping.
New code in deriveType(), getValidatedNodeTypeIfKnown(), and helper methods implements non-trivial UDT↔SQL type conversions. Add tests covering:
- Primitive UDTs (DATE, TIME, TIMESTAMP, BINARY)
- Nested types (arrays, maps, multisets, structs)
- Nullability preservation across conversions
Additionally, verify the IP type mapping: userDefinedTypeToSqlType() converts IP to NULLABLE_IP_UDT, but sqlTypeToUserDefinedType() lacks a corresponding reverse case to map SQL types back to IP UDT. Confirm this asymmetry is intentional or implement the missing conversion.
All new business logic requires unit tests in the same commit per coding guidelines.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java`
around lines 50 - 119, Add unit tests exercising PplValidator.deriveType and
getValidatedNodeTypeIfKnown plus the helper conversion methods
userDefinedTypeToSqlType and sqlTypeToUserDefinedType: cover primitive UDTs
(DATE, TIME, TIMESTAMP, BINARY), nested types (ARRAY, MAP, MULTISET, STRUCT),
and nullability preservation across round-trips. Also address the IP type
asymmetry by updating sqlTypeToUserDefinedType to recognize the SQL-side
NULLABLE_IP_UDT and map it back to the IP UDT (preserving nullable flag) or add
tests that assert the asymmetry if intentional; place all tests in the same
commit per guidelines.
| /** | ||
| * Creates a SqlValidator configured for PPL validation. | ||
| * | ||
| * @param statement Calcite server statement | ||
| * @param frameworkConfig Framework configuration | ||
| * @param operatorTable SQL operator table to use for validation | ||
| * @return configured SqlValidator instance | ||
| */ | ||
| public static PplValidator create( | ||
| FrameworkConfig frameworkConfig, | ||
| SqlOperatorTable operatorTable, | ||
| RelDataTypeFactory typeFactory, | ||
| SqlValidator.Config validatorConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Javadoc parameter list to match the method signature.
Line 61: Javadoc still mentions a statement parameter and omits typeFactory and validatorConfig, which violates the JavaDoc requirement for public methods.
✏️ Proposed fix
- * `@param` statement Calcite server statement
- * `@param` frameworkConfig Framework configuration
- * `@param` operatorTable SQL operator table to use for validation
- * `@return` configured SqlValidator instance
+ * `@param` frameworkConfig Framework configuration
+ * `@param` operatorTable SQL operator table to use for validation
+ * `@param` typeFactory Type factory for creating type information
+ * `@param` validatorConfig Validator configuration
+ * `@return` configured SqlValidator instanceAs per coding guidelines: All public classes and methods must have proper JavaDoc with @param, @return, and @throws.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java`
around lines 58 - 70, Update the Javadoc for the public static
PplValidator.create method to match its actual parameters: remove the outdated
`@param` statement entry and add `@param` tags for frameworkConfig, operatorTable,
typeFactory, and validatorConfig (briefly describe each), and ensure there is an
`@return` tag describing the configured PplValidator instance; keep any `@throws`
tags only if the method actually throws exceptions. Reference:
PplValidator.create(FrameworkConfig, SqlOperatorTable, RelDataTypeFactory,
SqlValidator.Config).
| private RelDataType userDefinedTypeToSqlType(RelDataType type) { | ||
| return convertType( | ||
| type, | ||
| t -> { | ||
| if (OpenSearchTypeUtil.isUserDefinedType(t)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) t; | ||
| ExprType udtType = exprType.getExprType(); | ||
| OpenSearchTypeFactory typeFactory = (OpenSearchTypeFactory) this.getTypeFactory(); | ||
| return switch (udtType) { | ||
| case ExprCoreType.TIMESTAMP -> | ||
| typeFactory.createSqlType(SqlTypeName.TIMESTAMP, t.isNullable()); | ||
| case ExprCoreType.TIME -> typeFactory.createSqlType(SqlTypeName.TIME, t.isNullable()); | ||
| case ExprCoreType.DATE -> typeFactory.createSqlType(SqlTypeName.DATE, t.isNullable()); | ||
| case ExprCoreType.BINARY -> | ||
| typeFactory.createSqlType(SqlTypeName.BINARY, t.isNullable()); | ||
| case ExprCoreType.IP -> UserDefinedFunctionUtils.NULLABLE_IP_UDT; | ||
| default -> t; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the Calcite version in dependencies
fd -t f "(pom\.xml|build\.gradle)" | head -5Repository: opensearch-project/sql
Length of output: 174
🏁 Script executed:
# Look for pom.xml in root and check Calcite version
cat pom.xml | grep -A 2 -B 2 "calcite" | head -30Repository: opensearch-project/sql
Length of output: 105
🏁 Script executed:
# Check the actual implementation in PplValidator.java
find . -name "PplValidator.java" -type fRepository: opensearch-project/sql
Length of output: 140
🏁 Script executed:
# Search for createTypeWithNullability usage in codebase
rg "createTypeWithNullability" -A 2 -B 2 | head -50Repository: opensearch-project/sql
Length of output: 5990
🏁 Script executed:
# Search for createSqlType usage patterns
rg "createSqlType" -A 2 -B 2 | head -80Repository: opensearch-project/sql
Length of output: 9474
🏁 Script executed:
# Find and check Calcite version in build.gradle files
grep -r "calcite" build.gradle | grep -i version | head -10Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Read the actual PplValidator.java file to see the implementation
cat -n core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java | sed -n '130,160p'Repository: opensearch-project/sql
Length of output: 1549
🏁 Script executed:
# Search for NULLABLE_IP_UDT to understand the IP type constant
rg "NULLABLE_IP_UDT" -A 3 -B 3Repository: opensearch-project/sql
Length of output: 3999
🏁 Script executed:
# Search for IP type handling in validation/type conversion
rg "isUserDefinedType\|IP.*type\|ExprCoreType.IP" -A 2 -B 2 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check the OpenSearchTypeFactory to see createSqlType signature
fd "OpenSearchTypeFactory.java" -type fRepository: opensearch-project/sql
Length of output: 236
🏁 Script executed:
# Look for createSqlType method signature
rg "def createSqlType|createSqlType\s*\(" core/src/main/java -A 2 | head -40Repository: opensearch-project/sql
Length of output: 3960
🏁 Script executed:
# Check build.gradle for Calcite version
cat build.gradle | grep -A 5 -B 5 "calcite"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Find OpenSearchTypeFactory file
find . -name "OpenSearchTypeFactory.java" -type fRepository: opensearch-project/sql
Length of output: 146
🏁 Script executed:
# Check the root build.gradle for calcite version more carefully
cat build.gradle | head -100Repository: opensearch-project/sql
Length of output: 4014
🏁 Script executed:
# Look for calcite version in all build.gradle files
find . -name "build.gradle" -exec grep -l "calcite" {} \;Repository: opensearch-project/sql
Length of output: 175
🏁 Script executed:
# Check the specific build.gradle files that mention calcite
find . -name "build.gradle" -exec grep -B 2 -A 2 "calcite" {} + | head -60Repository: opensearch-project/sql
Length of output: 3662
🏁 Script executed:
# Read OpenSearchTypeFactory to understand createSqlType signature
cat -n ./core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java | head -150Repository: opensearch-project/sql
Length of output: 7184
🏁 Script executed:
# Search for createSqlType method definition in OpenSearchTypeFactory
rg "createSqlType" ./core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java -A 5 -B 2 | head -40Repository: opensearch-project/sql
Length of output: 1790
🏁 Script executed:
# Check if OpenSearchTypeFactory extends RelDataTypeFactory and uses parent's createSqlType
cat -n ./core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java | grep -A 5 "class OpenSearchTypeFactory"Repository: opensearch-project/sql
Length of output: 383
🏁 Script executed:
# Let me understand the context of userDefinedTypeToSqlType method
cat -n ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java | sed -n '130,180p'Repository: opensearch-project/sql
Length of output: 2640
🏁 Script executed:
# Check the convertType helper method
rg "private.*convertType" ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java -A 10Repository: opensearch-project/sql
Length of output: 607
🏁 Script executed:
# Look for usage of userDefinedTypeToSqlType to understand its purpose
rg "userDefinedTypeToSqlType" ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java -B 3 -A 3Repository: opensearch-project/sql
Length of output: 336
🏁 Script executed:
# Check how IP type is handled in other validation code
rg "SqlTypeName.OTHER\|IP.*createSqlType\|OTHER" core/src/main/java/org/opensearch/sql/calcite/validate/ -A 2 -B 2 | head -40Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Look at createUDTWithAttributes to understand UDT conversion pattern
rg "createUDTWithAttributes" ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java -A 8Repository: opensearch-project/sql
Length of output: 997
🏁 Script executed:
# Search for IP type SQL representation recommendations
rg "IP.*VARCHAR\|IP.*OTHER\|ipv4\|ipv6" core/src/main/java -i | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check the full sqlTypeToUserDefinedType to see if there's IP handling
cat -n ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java | sed -n '159,180p'Repository: opensearch-project/sql
Length of output: 1261
🏁 Script executed:
# Look for ValidationUtils to see createUDTWithAttributes
find . -name "ValidationUtils.java" -type fRepository: opensearch-project/sql
Length of output: 143
🏁 Script executed:
# Check if there are any tests for userDefinedTypeToSqlType with IP
rg "userDefinedTypeToSqlType.*IP\|ExprCoreType.IP.*test" -i | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Read ValidationUtils to understand createUDTWithAttributes
cat -n ./core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java | head -100Repository: opensearch-project/sql
Length of output: 5242
🏁 Script executed:
# Check if PplValidator has a way to convert IP UDT back to SQL (in sqlTypeToUserDefinedType)
cat -n ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java | sed -n '159,172p'Repository: opensearch-project/sql
Length of output: 832
🏁 Script executed:
# Search for OTHER type usage to see if it's used for IP or custom types
rg "SqlTypeName.OTHER" core/src/main/java -B 2 -A 2 | head -40Repository: opensearch-project/sql
Length of output: 4549
IP type conversion is inconsistent with other UDT types—should use SqlTypeName.OTHER instead of returning a UDT.
Line 151 returns UserDefinedFunctionUtils.NULLABLE_IP_UDT, which is a user-defined type. However, the method userDefinedTypeToSqlType is designed to convert all UDT types to their SQL equivalents (TIMESTAMP → SqlTypeName.TIMESTAMP, TIME → SqlTypeName.TIME, etc.). The IP case breaks this contract.
According to the codebase (OpenSearchTypeUtil.java and UserDefinedFunctionUtils.java), SqlTypeName.OTHER is the designated SQL type for representing IP during validation since Calcite lacks a native IP type. The IP case should follow the same pattern as other branches:
case ExprCoreType.IP -> typeFactory.createSqlType(SqlTypeName.OTHER, t.isNullable());Note: The nullability handling in lines 145–150 using createSqlType(SqlTypeName, boolean) is correct—this is a custom OpenSearchTypeFactory overload that properly wraps with createTypeWithNullability internally.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java`
around lines 136 - 153, In PplValidator.userDefinedTypeToSqlType the IP branch
currently returns UserDefinedFunctionUtils.NULLABLE_IP_UDT (a UDT) which
violates the method's contract of converting UDTs to SQL types; change the
ExprCoreType.IP branch to use the OpenSearchTypeFactory to create a
SqlTypeName.OTHER SQL type with the same nullability (i.e., call
typeFactory.createSqlType(SqlTypeName.OTHER, t.isNullable())) so IP is
represented as SqlTypeName.OTHER during validation instead of a UDT.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
471-492: Add the missing import forUDFOperandMetadata.
UDFOperandMetadatais referenced here but isn’t imported, which will fail compilation unless it’s in the same package. Add the import (or fully qualify the reference).🛠️ Suggested fix
import org.opensearch.sql.calcite.utils.PPLOperandTypes; import org.opensearch.sql.calcite.utils.PPLReturnTypes; +import org.opensearch.sql.calcite.utils.UDFOperandMetadata; import org.opensearch.sql.calcite.utils.UserDefinedFunctionUtils;core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
399-425: Add null guards for external operator registrations.Fail fast if
operator/aggFunctionis null to avoid deferred NPEs during planning.🔧 Proposed fix
public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) { + if (operator == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Operator cannot be null for function %s", functionName)); + } if (externalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); } externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args)); } public void registerExternalAggOperator( BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { + if (aggFunction == null) { + throw new IllegalArgumentException( + String.format(Locale.ROOT, "Agg function cannot be null for %s", functionName)); + } if (aggExternalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format( Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); } AggHandler handler = (distinct, field, argList, ctx) -> UserDefinedFunctionUtils.makeAggregateCall( aggFunction, List.of(field), argList, ctx.relBuilder); aggExternalFunctionRegistry.put(functionName, handler); }
428-475: Add JavaDoc for the public resolve methods.These public APIs currently lack JavaDoc with
@param/@return(and@throwsif applicable).
As per coding guidelines, Public methods MUST have JavaDoc with@param,@return, and@throws.
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 1860-1917: The makeOver call in embedExistingCollationsIntoOver is
missing the required RexWindowExclusion argument; update the call in method
embedExistingCollationsIntoOver (where context.rexBuilder.makeOver is invoked
for RexOver) to insert RexWindowExclusion.EXCLUDE_NO_OTHER between
window.getUpperBound() and window.isRows(), preserving the rest of the
parameters (type, agg operator, operands, partitionKeys,
orderKeys/rexCollations, lowerBound, upperBound, exclude, isRows,
allowPartial(true), isNullWhenCountZero(false), over.isDistinct(),
over.ignoreNulls()) so the parameter ordering matches RexBuilder.makeOver.
In `@core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java`:
- Around line 306-313: Add a unit test that exercises
DynamicFieldsHelper.createEmptyMap to cover the empty-keys map creation path and
assert the returned RexNode is a non-null NULL literal of MAP type (i.e., not an
unexpected Java null and its type is a MAP of VARCHAR->VARCHAR). Create a test
in the corresponding test class (e.g., DynamicFieldsHelperTest) that constructs
or mocks a CalcitePlanContext with a RexBuilder/TypeFactory (or reuse existing
test helpers), calls createEmptyMap(context), and asserts the result is not null
and result.getType().isMap() (and that the map key/value types are VARCHAR).
Ensure the new test method name clearly indicates it verifies empty-keys map
creation (e.g., testCreateEmptyMapReturnsMapNullLiteral).
- Around line 293-303: Add unit tests covering the two new private helpers
createEmptyMap() and getFieldsAsMap(): write tests that exercise
createEmptyMap() returning a NULL-cast-to-MAP literal and verify consumer
behavior (e.g., MapAppendFunctionImpl.mapAppend() preserves non-null maps when
one arg is NULL), and tests for getFieldsAsMap() producing the expected
MAP_FROM_ARRAYS RexNode for given keys; also update the JavaDoc on
createEmptyMap() to clarify it returns a NULL literal cast to MAP (not a
concrete empty map) so the comment matches behavior. Ensure tests live alongside
other core Java tests and reference the helper methods by name (createEmptyMap,
getFieldsAsMap) when locating/covering the code.
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`:
- Around line 370-388: In OpenSearchTypeFactory, add an explicit early check for
the all-NULL case before the current conditional chain: if nullCount ==
types.size() return the NULL SQL type (or delegate to the existing
leastRestrictive behavior) instead of falling through to the date/time/ip
checks; update the method that builds udt (the block that calls
createUDT(ExprUDT.EXPR_DATE, ...), createUDT(...EXPR_TIME...),
createUDT(...EXPR_IP...), createUDT(...EXPR_BINARY...), and
createSqlType(SqlTypeName.VARCHAR, ...)) to handle the all-null case first so an
all-NULL input does not incorrectly yield ExprUDT.EXPR_DATE.
🧹 Nitpick comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java (1)
160-165: Make the dynamic-field test assert extracted values (not just 0 rows).
verifyNumOfRows(result, 0)can pass even if dynamic-field extraction regresses, and it’s brittle if the logs fixture changes. Consider asserting at least one known row/value (or inserting a deterministic fixture) so the test validates actual SPATH extraction behavior.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1839-1846: Track the new TODO with an issue or reference.This will make the push‑down regression easier to triage later. I can open a follow‑up issue if you want.
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
332-397: Add JavaDoc for the publicleastRestrictiveoverride.Per coding guidelines, public methods should have JavaDoc with
@param,@return, and@throwstags. This method implements important OpenSearch UDT semantics that differ from standard Calcite behavior and would benefit from documentation.Additionally, the timestamp promotion logic on line 384 (
binaryCount == 0 && ipCount == 0) includes cases wheredateCount + timeCount + timestampCount + nullCount == types.size(), buttimestampCountis never tracked. This appears intentional (any mix of date/time UDTs becomes timestamp), but documenting this behavior would improve clarity.📝 Suggested JavaDoc
`@Override` + /** + * Returns the least restrictive type among the given types, with special handling for OpenSearch + * UDTs. + * + * <p>For UDTs: + * <ul> + * <li>Homogeneous DATE/TIME/IP/BINARY types (plus NULLs) return the corresponding UDT</li> + * <li>Mixed DATE/TIME/TIMESTAMP types are promoted to TIMESTAMP</li> + * <li>Mixed UDT types that cannot be unified return VARCHAR</li> + * </ul> + * + * <p>For standard SQL types, delegates to Calcite with PPL coercion rules, converting CHAR to + * VARCHAR to avoid result padding. + * + * `@param` types the list of types to find the least restrictive common type for + * `@return` the least restrictive type, or null if none exists + */ public `@Nullable` RelDataType leastRestrictive(List<RelDataType> types) {
| /** | ||
| * Embed existing collation into window function's over clauses. | ||
| * | ||
| * <p>Window functions with frame specifications like {@code ROWS n PRECEDING} require ORDER BY to | ||
| * determine row order. Without it, results are non-deterministic. | ||
| * | ||
| * <p>Without this fix, the initial plan has ORDER BY separate from window functions: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ROWS 1 PRECEDING)) ← Missing ORDER BY | ||
| * LogicalSort(sort0=[$5]) | ||
| * </pre> | ||
| * | ||
| * <p>This causes problems during validation as the order is not bound to the window. With this | ||
| * fix, sort collations are embeded into each {@code RexOver} window: | ||
| * | ||
| * <pre> | ||
| * LogicalProject(SUM($5) OVER (ORDER BY $5 ROWS 1 PRECEDING)) ← ORDER BY embedded | ||
| * </pre> | ||
| * | ||
| * @param overExpressions Window function expressions (may contain nested {@link RexOver}) | ||
| * @param context Plan context for building RexNodes | ||
| * @return Expressions with ORDER BY embedded in all window specifications | ||
| */ | ||
| private List<RexNode> embedExistingCollationsIntoOver( | ||
| List<RexNode> overExpressions, CalcitePlanContext context) { | ||
| RelCollation existingCollation = context.relBuilder.peek().getTraitSet().getCollation(); | ||
| List<@NonNull RelFieldCollation> relCollations = | ||
| existingCollation == null ? List.of() : existingCollation.getFieldCollations(); | ||
| ImmutableList<@NonNull RexFieldCollation> rexCollations = | ||
| relCollations.stream() | ||
| .map(f -> relCollationToRexCollation(f, context.relBuilder)) | ||
| .collect(ImmutableList.toImmutableList()); | ||
| return overExpressions.stream() | ||
| .map( | ||
| n -> | ||
| n.accept( | ||
| new RexShuttle() { | ||
| @Override | ||
| public RexNode visitOver(RexOver over) { | ||
| RexWindow window = over.getWindow(); | ||
| return context.rexBuilder.makeOver( | ||
| over.getType(), | ||
| over.getAggOperator(), | ||
| over.getOperands(), | ||
| window.partitionKeys, | ||
| rexCollations, | ||
| window.getLowerBound(), | ||
| window.getUpperBound(), | ||
| window.isRows(), | ||
| true, | ||
| false, | ||
| over.isDistinct(), | ||
| over.ignoreNulls()); | ||
| } | ||
| })) | ||
| .collect(Collectors.toList()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for RexOver class definition and its methods
rg -n "class RexOver" --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for makeOver method signature to understand all parameters
rg -n "makeOver\(" --type=java -A 10 | grep -E "makeOver|allowPartial|nullWhenCountZero" | head -50Repository: opensearch-project/sql
Length of output: 1319
🏁 Script executed:
#!/bin/bash
# Check if RexOver has getters for allowPartial or isNullWhenCountZero
rg -n "getAllowPartial|getNullWhenCountZero|isAllowPartial|isNullWhenCountZero|allowPartial|nullWhenCountZero" --type=java | head -40Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the specific method implementation in CalciteRelNodeVisitor
sed -n '1860,1917p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 2631
🏁 Script executed:
#!/bin/bash
# Check PlanUtils.makeOver method signature and implementation
sed -n '157,200p' core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javaRepository: opensearch-project/sql
Length of output: 2000
🏁 Script executed:
#!/bin/bash
# Compare other makeOver calls to see parameter patterns
rg -B2 -A12 "rexBuilder.makeOver|PlanUtils.makeOver" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | head -80Repository: opensearch-project/sql
Length of output: 2920
🏁 Script executed:
#!/bin/bash
# Check ExtendedRelJson.java usage of makeOver (line 448) for reference
sed -n '440,465p' opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.javaRepository: opensearch-project/sql
Length of output: 1011
🏁 Script executed:
#!/bin/bash
# Search for methods called on RexOver objects to see what properties are available
rg -n "over\." core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "over\.[a-zA-Z]+" | head -30Repository: opensearch-project/sql
Length of output: 395
🏁 Script executed:
#!/bin/bash
# Check if there are imports or references to Calcite's RexOver methods
rg -n "isAllowPartial|isNullWhenCountZero|allowPartial|nullWhenCountZero" core/src/main/java/ opensearch/src/main/java/Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the Calcite version constraints to understand API compatibility
cat pom.xml | grep -A 5 -B 5 "calcite" | head -50Repository: opensearch-project/sql
Length of output: 105
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation or comments about makeOver parameters
rg -B5 "makeOver" core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java | grep -E "makeOver|//|/\*" | head -40Repository: opensearch-project/sql
Length of output: 92
🏁 Script executed:
#!/bin/bash
# Search for allowPartial and nullWhenCountZero in the entire codebase
find . -name "*.java" -type f | xargs grep -l "allowPartial\|nullWhenCountZero" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
#!/bin/bash
# Look at the actual Calcite RexBuilder.makeOver signatures in any comments or docs
rg -B20 "rexBuilder\.makeOver" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "^[0-9]+-.*(\*|makeOver)" | head -40Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if other usages in the codebase try to preserve these values from source RexOver
rg -B5 -A15 "makeOver.*over\." core/src/main/java/org/opensearch/sql/calcite/ | grep -E "over\.|makeOver|allowPartial|nullWhenCountZero" | head -50Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look for the full PlanUtils method to see the sumOver and countOver helper functions
sed -n '157,300p' core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javaRepository: opensearch-project/sql
Length of output: 5371
🏁 Script executed:
#!/bin/bash
# Search for helper methods that might extract these properties from RexOver
rg -n "getallowPartial|getNullWhenCountZero|getAllowPartial|isNullWhenCountZero" --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if PlanUtils has a method that wraps makeOver calls with parameter handling
sed -n '200,250p' core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javaRepository: opensearch-project/sql
Length of output: 1577
🏁 Script executed:
#!/bin/bash
# Look for comments or documentation about the makeOver parameters, especially those two
rg -B10 "true.*false" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "(1910|1911|true|false|allow|null|makeOver)" | head -30Repository: opensearch-project/sql
Length of output: 546
🏁 Script executed:
#!/bin/bash
# Check what Calcite version this project uses - may help understand the API
find . -name "pom.xml" -type f | xargs grep -h "calcite" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at all pom.xml files in the repo to find Calcite version
find . -type f -name "pom.xml" | xargs grep -l "calcite" | head -5Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the gradle files if any
find . -type f -name "build.gradle" -o -name "build.gradle.kts" | xargs grep -l "calcite" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 175
🏁 Script executed:
#!/bin/bash
# Search for RexBuilder in imports to understand which Calcite version is used
rg -n "import.*RexBuilder|import.*RexOver" --type=java | head -20Repository: opensearch-project/sql
Length of output: 2761
🏁 Script executed:
#!/bin/bash
# Look at the complete line context for the makeOver call to see if there are inline comments
sed -n '1895,1920p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 1128
🏁 Script executed:
#!/bin/bash
# Check the core build.gradle for Calcite version
sed -n '1,100p' core/build.gradle | grep -i calciteRepository: opensearch-project/sql
Length of output: 168
🏁 Script executed:
#!/bin/bash
# Look for any usage of RexStandardizer which imports RexOver
sed -n '1,150p' opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java | head -80Repository: opensearch-project/sql
Length of output: 3279
🏁 Script executed:
#!/bin/bash
# Search for how RexOver is used elsewhere in the codebase to understand available methods
rg -A5 "visitOver|RexOver" core/src/main/java/org/opensearch/sql/calcite/ | grep -E "over\.|RexOver" | head -40Repository: opensearch-project/sql
Length of output: 1447
🏁 Script executed:
#!/bin/bash
# Search for tests of embedExistingCollationsIntoOver
rg -n "embedExistingCollationsIntoOver" --type=javaRepository: opensearch-project/sql
Length of output: 359
🏁 Script executed:
#!/bin/bash
# Check what parameters RexBuilder.makeOver accepts in Calcite 1.41.0
# by looking at all makeOver invocations with their parameter counts
rg -A20 "\.makeOver\(" core/src/main/java/org/opensearch/sql/calcite/ | grep -E "makeOver\(|,\s*\)|over\.|window\." | head -60Repository: opensearch-project/sql
Length of output: 2140
🏁 Script executed:
#!/bin/bash
# Look for any documentation or comments about what allowPartial and isNullWhenCountZero mean
rg -B5 -A5 "allowPartial|nullWhenCountZero" . --type=java 2>/dev/null | head -40Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if RexOver provides any getter methods beyond what we've already seen
rg -n "over\.get|over\.is" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -v "getWindow\|getType\|getAggOperator\|getOperands\|isDistinct\|ignoreNulls"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look more carefully at the exact makeOver signature by checking line counts
sed -n '1901,1913p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | cat -nRepository: opensearch-project/sql
Length of output: 766
🏁 Script executed:
#!/bin/bash
# Check if there's an exclude parameter being used elsewhere
rg -n "RexWindowExclusion\|getExclude" --type=java | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Count the parameters in the ExtendedRelJson makeOver call more carefully
sed -n '448,465p' opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java | cat -nRepository: opensearch-project/sql
Length of output: 814
🏁 Script executed:
#!/bin/bash
# Search for documentation or comments about what parameters 9 and 10 represent
rg -B10 "true,\s*false," core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "^\s*\*|makeOver|true.*false" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# The two makeOver calls have different signatures. Let's search for documentation or other usages
# to determine which is correct. Check if exclude is a newer addition
rg -n "exclude.*EXCLUDE" --type=java | head -20Repository: opensearch-project/sql
Length of output: 218
🏁 Script executed:
#!/bin/bash
# Search for RexWindowExclusion usage
rg -n "RexWindowExclusion" --type=javaRepository: opensearch-project/sql
Length of output: 1408
🏁 Script executed:
#!/bin/bash
# Look at the imports in ExtendedRelJson to see if it uses exclude
head -50 opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java | grep -E "import|RexWindow"Repository: opensearch-project/sql
Length of output: 1888
🏁 Script executed:
#!/bin/bash
# Count total parameters by looking at ExtendedRelJson more carefully with all makeOver calls
rg -B5 -A20 "rexBuilder.makeOver" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java | head -80Repository: opensearch-project/sql
Length of output: 1048
🏁 Script executed:
#!/bin/bash
# Check what RexWindow exposes - does it have getExclude or similar?
rg -n "window\." core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "window\.(get|is)" | head -20Repository: opensearch-project/sql
Length of output: 229
🏁 Script executed:
#!/bin/bash
# Compare the parameter order and count: ExtendedRelJson has 13 params, CalciteRelNodeVisitor has 12
# Let's check if window has an exclude field
rg -n "getExclude\|exclusion" --type=java | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Let's look at what the original RexOver contains - does it have exclude info?
rg -B5 -A10 "class RexWindow\|RexWindow(" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java | head -40Repository: opensearch-project/sql
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Critical: Check if parameter mismatch - ExtendedRelJson has exclude param, CalciteRelNodeVisitor doesn't
# Let's see ALL the parameters being passed to understand the order
echo "=== CalciteRelNodeVisitor version ===" && sed -n '1901,1913p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java && echo "" && echo "=== ExtendedRelJson version ===" && sed -n '448,461p' opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.javaRepository: opensearch-project/sql
Length of output: 1204
🏁 Script executed:
#!/bin/bash
# This is critical - the parameter lists don't match. Let's see if RexWindow has getExclude
rg -n "\.getExclude|Exclusion" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 223
🏁 Script executed:
#!/bin/bash
# Check if RexWindow exposes exclusion information
sed -n '1900,1915p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "window\."Repository: opensearch-project/sql
Length of output: 264
🏁 Script executed:
#!/bin/bash
# Look for RexWindow.getExclude or similar in the whole codebase
rg -n "getExclude" --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# The signatures don't match - CalciteRelNodeVisitor is missing the exclude parameter
# Let's verify the exact Calcite RexBuilder.makeOver signature by checking javadoc or other hints
# Search for how many parameters makeOver should have in different contexts
rg -n "makeOver" --type=java -A1 | grep -E "makeOver.*\(" | wc -lRepository: opensearch-project/sql
Length of output: 68
🏁 Script executed:
#!/bin/bash
# Search online Calcite documentation or check if there's version info about when exclude was added
# For now, let's verify whether window object has exclusion info
rg -n "window\.get|RexWindow\." core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | head -20Repository: opensearch-project/sql
Length of output: 179
🏁 Script executed:
#!/bin/bash
# Check if there are overloaded versions of makeOver in Calcite
# by looking at how it's being called in different ways
rg -B2 -A15 "rexBuilder\.makeOver" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | head -50Repository: opensearch-project/sql
Length of output: 886
🏁 Script executed:
#!/bin/bash
# Check the git history to see if this code was recently added or if there's a comment about it
git log --oneline -20 core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java 2>/dev/null | head -20Repository: opensearch-project/sql
Length of output: 134
🏁 Script executed:
#!/bin/bash
# Search for any TODO or FIXME comments near the embedExistingCollationsIntoOver method
sed -n '1850,1920p' core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -E "TODO|FIXME|BUG|XXX"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if the code compiles without errors by looking for any compilation markers
rg -n "PARAM_MISMATCH|parameter.*mismatch" --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
web_search
Apache Calcite 1.41.0 RexBuilder.makeOver method signature parametersRepository: opensearch-project/sql
Length of output: 161
🏁 Script executed:
#!/bin/bash
# Let's try to use web_search properly with the capability
# For now, check if there are any recent changes or PRs related to this
git log --all --oneline --grep="makeOver\|window\|collation" 2>/dev/null | head -20Repository: opensearch-project/sql
Length of output: 1400
🏁 Script executed:
#!/bin/bash
# Check if RexWindow has an exclusion getter that should be used
rg -n "RexWindow\|class.*Window" core/src/main/java/org/opensearch/sql/calcite/ --type=java | grep -v ".class" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Critical check: Does RexWindow from over.getWindow() have an exclusion?
# by checking what RexWindow methods are available
rg -n "over\.getWindow\(\)" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java -A5 | head -30Repository: opensearch-project/sql
Length of output: 409
🏁 Script executed:
#!/bin/bash
# Let's check if there's a variant of makeOver that takes fewer parameters
# by looking at how PlanUtils helper methods wrap it
sed -n '200,250p' core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java | grep -E "makeOver|relBuilder\."Repository: opensearch-project/sql
Length of output: 395
🏁 Script executed:
#!/bin/bash
# Critical discovery: the parameter lists differ significantly
# ExtendedRelJson has 14 params, CalciteRelNodeVisitor has 12
# AND ExtendedRelJson includes RexWindowExclusion which is missing from CalciteRelNodeVisitor
# Let's verify if the code even compiles by checking for any build errors
find . -name "*.log" -o -name "*error*" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 454
🏁 Script executed:
#!/bin/bash
# Let's check the git commit that added embedExistingCollationsIntoOver
git log --all -p --follow -- core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -B20 -A20 "embedExistingCollationsIntoOver" | head -100Repository: opensearch-project/sql
Length of output: 152
🏁 Script executed:
#!/bin/bash
# Check if there's any test that validates this method works correctly
find . -name "*Test.java" -type f | xargs grep -l "embedExistingCollationsIntoOver\|window.*collation" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
#!/antml:tool_start
```web_search
Calcite 1.41.0 RexBuilder makeOver signature documentation
```
</tool_start>Repository: opensearch-project/sql
Length of output: 267
🏁 Script executed:
#!/bin/bash
# Let me check the actual failing scenario more carefully
# The parameter mismatch is concerning - let's see if CalciteRelNodeVisitor was recently modified
git log --all --oneline -5 -- core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 367
🏁 Script executed:
#!/bin/bash
# Check the PR that added this method
git log --all -p -- core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | grep -B50 "embedExistingCollationsIntoOver" | grep "^commit\|^Author\|^Date" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# The critical issue: parameter 8 in CalciteRelNodeVisitor is window.isRows()
# but in ExtendedRelJson it's exclude (RexWindowExclusion)
# Let's verify what RexBuilder.makeOver actually expects
# by checking if window has an exclusion property
rg -n "getExclusion\|getExclude\|exclusion\|Exclusion" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/antml:thinking_start
Looking at the parameter discrepancy:
CalciteRelNodeVisitor version (12 params):
1. type
2. operator
3. operands
4. partitionKeys
5. orderKeys (as rexCollations)
6. lowerBound
7. upperBound
8. isRows (boolean)
9. true
10. false
11. distinct
12. ignoreNulls
ExtendedRelJson version (13 params):
1. type
2. operator
3. operands
4. partitionKeys
5. orderKeys (as orderKeys)
6. lowerBound
7. upperBound
8. exclude (RexWindowExclusion)
9. physical (boolean)
10. true
11. false
12. distinct
13. false
The issue is clear: CalciteRelNodeVisitor is missing the `exclude` parameter and has the parameter types wrong. Parameter 8 should be RexWindowExclusion, not isRows().
This looks like a genuine bug in the embedExistingCollationsIntoOver method - it's missing the exclude parameter which should come from window.getExclusion() or default to EXCLUDE_NO_OTHER.
Let me search for whether RexWindow has a getExclusion method.
</thinking_start>Repository: opensearch-project/sql
Length of output: 241
🏁 Script executed:
#!/antml:thinking_endRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if RexWindow class has getExclusion method by looking at how it's used elsewhere
rg -n "window\.getExclusion|RexWindow" core/src/main/java/org/opensearch/sql/calcite/ --type=java | grep -i "exclusion" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/antml:thinking_start
The user hasn't provided results yet. Let me think about what I know:
1. ExtendedRelJson passes a RexWindowExclusion parameter to makeOver
2. It gets this from window.get("exclude") or defaults to RexWindowExclusion.EXCLUDE_NO_OTHER
3. CalciteRelNodeVisitor doesn't pass any RexWindowExclusion parameter
4. The parameter at position 8 in CalciteRelNodeVisitor is window.isRows() which seems wrong
This is a real bug - the makeOver signature requires RexWindowExclusion but the code doesn't provide it. The question is whether window.getExclusion() exists or if we need to default to EXCLUDE_NO_OTHER like ExtendedRelJson does.
</thinking_end>Repository: opensearch-project/sql
Length of output: 287
Fix missing RexWindowExclusion parameter in makeOver call.
The makeOver call is missing a required parameter. Comparing with ExtendedRelJson.java (line 448), the Calcite RexBuilder.makeOver signature expects RexWindowExclusion at parameter position 8, but the current code passes window.isRows() instead.
The correct order should be:
- type
- operator
- operands
- partitionKeys
- orderKeys
- lowerBound
- upperBound
- exclude (RexWindowExclusion) — currently missing
- followed by remaining parameters
Add the missing exclude parameter (defaulting to RexWindowExclusion.EXCLUDE_NO_OTHER like in ExtendedRelJson):
return context.rexBuilder.makeOver(
over.getType(),
over.getAggOperator(),
over.getOperands(),
window.partitionKeys,
rexCollations,
window.getLowerBound(),
window.getUpperBound(),
RexWindowExclusion.EXCLUDE_NO_OTHER, // Add this
window.isRows(),
true,
false,
over.isDistinct(),
over.ignoreNulls());The original concern about preserving allowPartial and isNullWhenCountZero from the source RexOver is not actionable—the RexOver API does not expose these properties as getters, making hardcoding acceptable.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 1860 - 1917, The makeOver call in embedExistingCollationsIntoOver
is missing the required RexWindowExclusion argument; update the call in method
embedExistingCollationsIntoOver (where context.rexBuilder.makeOver is invoked
for RexOver) to insert RexWindowExclusion.EXCLUDE_NO_OTHER between
window.getUpperBound() and window.isRows(), preserving the rest of the
parameters (type, agg operator, operands, partitionKeys,
orderKeys/rexCollations, lowerBound, upperBound, exclude, isRows,
allowPartial(true), isNullWhenCountZero(false), over.isDistinct(),
over.ignoreNulls()) so the parameter ordering matches RexBuilder.makeOver.
| // Handle empty keys case - Calcite ARRAY[] requires at least 1 element | ||
| if (keys.isEmpty()) { | ||
| return createEmptyMap(context); | ||
| } | ||
| RexNode keysArray = getStringLiteralArray(keys, context); | ||
| List<RexNode> values = | ||
| keys.stream().map(key -> context.relBuilder.field(key)).collect(Collectors.toList()); | ||
| keys.stream() | ||
| .map(key -> context.relBuilder.cast(context.relBuilder.field(key), SqlTypeName.VARCHAR)) | ||
| .collect(Collectors.toList()); | ||
| RexNode valuesArray = makeArray(values, context); | ||
| return context.rexBuilder.makeCall(BuiltinFunctionName.MAP_FROM_ARRAYS, keysArray, valuesArray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd /repo && find . -name "DynamicFieldsHelper.java" -type fRepository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
cd /repo && wc -l core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.javaRepository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
cd /repo && cat -n core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '280,330p'Repository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
cd /repo && cat -n core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '1,50p'Repository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
cd /repo && grep -n "createEmptyMap\|MAP_FROM_ARRAYS\|MAP_APPEND" core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.javaRepository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
cd /repo && find . -path "*/test*" -name "*DynamicFieldsHelperTest*" -type fRepository: opensearch-project/sql
Length of output: 121
🏁 Script executed:
find . -name "DynamicFieldsHelper.java" -type fRepository: opensearch-project/sql
Length of output: 138
🏁 Script executed:
wc -l ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.javaRepository: opensearch-project/sql
Length of output: 142
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '280,330p'Repository: opensearch-project/sql
Length of output: 2638
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '1,100p'Repository: opensearch-project/sql
Length of output: 4829
🏁 Script executed:
grep -n "createEmptyMap\|MAP_FROM_ARRAYS\|MAP_APPEND" ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.javaRepository: opensearch-project/sql
Length of output: 649
🏁 Script executed:
find . -path "*/test*" -name "*DynamicFieldsHelperTest*" -type fRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
find . -name "*DynamicFieldsHelper*" -type fRepository: opensearch-project/sql
Length of output: 138
🏁 Script executed:
find . -path "*/test/*" -name "*DynamicFields*" -type fRepository: opensearch-project/sql
Length of output: 157
🏁 Script executed:
grep -r "DynamicFieldsHelper" --include="*Test.java" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '160,250p'Repository: opensearch-project/sql
Length of output: 4505
🏁 Script executed:
grep -n "MAP_FROM_ARRAYS\|MAP_APPEND" ./core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java -B 2 -A 2Repository: opensearch-project/sql
Length of output: 1275
🏁 Script executed:
grep -r "getFieldsAsMap" --include="*.java"Repository: opensearch-project/sql
Length of output: 676
🏁 Script executed:
find . -path "*/test/*" -name "*.java" -exec grep -l "createEmptyMap\|getFieldsAsMap" {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
find . -name "pom.xml" -type f | head -5Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
cat ./pom.xml | grep -A 5 "calcite"Repository: opensearch-project/sql
Length of output: 107
🏁 Script executed:
find . -name "pom.xml" -type f -exec grep -l "calcite" {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
grep -r "MAP_APPEND\|MAP_FROM_ARRAYS" --include="*.java" | head -30Repository: opensearch-project/sql
Length of output: 3167
🏁 Script executed:
find . -path "*/test/*" -name "*.java" -exec grep -l "MAP_APPEND\|MAP_FROM_ARRAYS" {} \;Repository: opensearch-project/sql
Length of output: 155
🏁 Script executed:
grep -r "makeNullLiteral" --include="*.java" | head -20Repository: opensearch-project/sql
Length of output: 3067
🏁 Script executed:
cat -n ./integ-test/src/test/java/org/opensearch/sql/calcite/standalone/MapAppendFunctionIT.javaRepository: opensearch-project/sql
Length of output: 6404
🏁 Script executed:
cat -n ./core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.javaRepository: opensearch-project/sql
Length of output: 4823
🏁 Script executed:
find . -name "pom.xml" -type f | xargs grep -l "calcite" | head -1 | xargs cat | grep -A 5 "calcite-core\|calcite.version"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
find . -name "*.xml" -type f | xargs grep -l "calcite" | head -1Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
find . -name "pom.xml" -type f -exec grep -l "properties\|version" {} \; | head -3Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
cat ./pom.xml 2>/dev/null || find . -maxdepth 2 -name "pom.xml" -type f -exec head -30 {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
ls -la | grep -i pomRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
find . -maxdepth 3 -name "pom.xml" -type fRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
grep -n "private static RexNode getFieldsAsMap\|private static RexNode createEmptyMap" core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java -A 1Repository: opensearch-project/sql
Length of output: 325
🏁 Script executed:
find . -path "*/test/*" -name "*.java" -exec grep -l "getFieldsAsMap\|createEmptyMap" {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
grep -r "excludeMetaFields\|isMetadataField" core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java -B 2 -A 2Repository: opensearch-project/sql
Length of output: 1003
🏁 Script executed:
find . -path "*/test/*" -name "*.java" | xargs grep -l "DynamicFieldsHelper" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
grep -n "new DynamicFieldsHelper\|DynamicFieldsHelper\." core/src/main/java/org/opensearch/sql/calcite/*.java | head -20Repository: opensearch-project/sql
Length of output: 2272
🏁 Script executed:
git log --oneline -20 -- core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 134
🏁 Script executed:
git show HEAD:core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java 2>/dev/null | grep -n "createEmptyMap\|getFieldsAsMap" | head -5Repository: opensearch-project/sql
Length of output: 450
🏁 Script executed:
cat -n core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java | sed -n '287,315p'Repository: opensearch-project/sql
Length of output: 1606
Add unit tests for new helper functions per coding guidelines.
The coding guidelines require that new functions in core/src/main/java/**/*.java must have unit tests in the same commit. Both createEmptyMap() and getFieldsAsMap() are new private utilities and need test coverage.
Additionally, note that createEmptyMap() returns a NULL literal (not an actual empty map), which is semantically different from its JavaDoc comment "Create an empty map by casting NULL to MAP type". However, this is not a functional bug: MAP_APPEND has explicit null-handling logic (see MapAppendFunctionImpl.mapAppend() lines 69-81) that returns the non-null map when one argument is NULL, so the dynamic fields map will not be wiped out. The existing test suite confirms this behavior (MapAppendFunctionIT lines 87-113). Still, consider updating the comment for clarity.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java`
around lines 293 - 303, Add unit tests covering the two new private helpers
createEmptyMap() and getFieldsAsMap(): write tests that exercise
createEmptyMap() returning a NULL-cast-to-MAP literal and verify consumer
behavior (e.g., MapAppendFunctionImpl.mapAppend() preserves non-null maps when
one arg is NULL), and tests for getFieldsAsMap() producing the expected
MAP_FROM_ARRAYS RexNode for given keys; also update the JavaDoc on
createEmptyMap() to clarify it returns a NULL literal cast to MAP (not a
concrete empty map) so the comment matches behavior. Ensure tests live alongside
other core Java tests and reference the helper methods by name (createEmptyMap,
getFieldsAsMap) when locating/covering the code.
| /** Create an empty map by casting NULL to MAP type */ | ||
| private static RexNode createEmptyMap(CalcitePlanContext context) { | ||
| RelDataType varcharType = | ||
| context.rexBuilder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR); | ||
| RelDataType mapType = | ||
| context.rexBuilder.getTypeFactory().createMapType(varcharType, varcharType); | ||
| return context.rexBuilder.makeNullLiteral(mapType); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit test coverage for empty-keys map creation.
A new helper function was added to core/src/main/java; please add a unit test that exercises the empty-keys path (and ensures _MAP is not unexpectedly NULL).
As per coding guidelines: New functions MUST have unit tests in the same commit.
🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java`
around lines 306 - 313, Add a unit test that exercises
DynamicFieldsHelper.createEmptyMap to cover the empty-keys map creation path and
assert the returned RexNode is a non-null NULL literal of MAP type (i.e., not an
unexpected Java null and its type is a MAP of VARCHAR->VARCHAR). Create a test
in the corresponding test class (e.g., DynamicFieldsHelperTest) that constructs
or mocks a CalcitePlanContext with a RexBuilder/TypeFactory (or reuse existing
test helpers), calls createEmptyMap(context), and asserts the result is not null
and result.getType().isMap() (and that the map key/value types are VARCHAR).
Ensure the new test method name clearly indicates it verifies empty-keys map
creation (e.g., testCreateEmptyMapReturnsMapNullLiteral).
| if (dateCount + nullCount == types.size()) { | ||
| udt = createUDT(ExprUDT.EXPR_DATE, nullable); | ||
| } else if (timeCount + nullCount == types.size()) { | ||
| udt = createUDT(ExprUDT.EXPR_TIME, nullable); | ||
| } | ||
| // There are cases where UDT IP interleaves with its intermediate SQL type for validation | ||
| // OTHER, we check otherCount to patch such cases | ||
| else if (ipCount + nullCount == types.size() || otherCount + nullCount == types.size()) { | ||
| udt = createUDT(ExprUDT.EXPR_IP, nullable); | ||
| } else if (binaryCount + nullCount == types.size()) { | ||
| udt = createUDT(ExprUDT.EXPR_BINARY, nullable); | ||
| } | ||
| // There exists a mix of time, date, and timestamp (and optionally null) | ||
| else if (binaryCount == 0 && ipCount == 0) { | ||
| udt = createUDT(ExprUDT.EXPR_TIMESTAMP, nullable); | ||
| } else { | ||
| udt = createSqlType(SqlTypeName.VARCHAR, nullable); | ||
| } | ||
| return udt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: all-NULL input may incorrectly return DATE UDT.
When types contains only NULL types, nullCount == types.size() and dateCount + nullCount == types.size() evaluates to true (since dateCount == 0), causing the method to return EXPR_DATE. This may not be the intended behavior—an all-NULL list should likely return NULL type or delegate to the standard leastRestrictive.
🐛 Proposed fix
if (anyCount == 0) {
RelDataType udt;
boolean nullable = nullableCount > 0 || nullCount > 0;
- if (dateCount + nullCount == types.size()) {
+ if (dateCount > 0 && dateCount + nullCount == types.size()) {
udt = createUDT(ExprUDT.EXPR_DATE, nullable);
- } else if (timeCount + nullCount == types.size()) {
+ } else if (timeCount > 0 && timeCount + nullCount == types.size()) {
udt = createUDT(ExprUDT.EXPR_TIME, nullable);
}
// There are cases where UDT IP interleaves with its intermediate SQL type for validation
// OTHER, we check otherCount to patch such cases
- else if (ipCount + nullCount == types.size() || otherCount + nullCount == types.size()) {
+ else if ((ipCount > 0 && ipCount + nullCount == types.size()) || (otherCount > 0 && otherCount + nullCount == types.size())) {
udt = createUDT(ExprUDT.EXPR_IP, nullable);
- } else if (binaryCount + nullCount == types.size()) {
+ } else if (binaryCount > 0 && binaryCount + nullCount == types.size()) {
udt = createUDT(ExprUDT.EXPR_BINARY, nullable);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (dateCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_DATE, nullable); | |
| } else if (timeCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_TIME, nullable); | |
| } | |
| // There are cases where UDT IP interleaves with its intermediate SQL type for validation | |
| // OTHER, we check otherCount to patch such cases | |
| else if (ipCount + nullCount == types.size() || otherCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_IP, nullable); | |
| } else if (binaryCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_BINARY, nullable); | |
| } | |
| // There exists a mix of time, date, and timestamp (and optionally null) | |
| else if (binaryCount == 0 && ipCount == 0) { | |
| udt = createUDT(ExprUDT.EXPR_TIMESTAMP, nullable); | |
| } else { | |
| udt = createSqlType(SqlTypeName.VARCHAR, nullable); | |
| } | |
| return udt; | |
| if (dateCount > 0 && dateCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_DATE, nullable); | |
| } else if (timeCount > 0 && timeCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_TIME, nullable); | |
| } | |
| // There are cases where UDT IP interleaves with its intermediate SQL type for validation | |
| // OTHER, we check otherCount to patch such cases | |
| else if ((ipCount > 0 && ipCount + nullCount == types.size()) || (otherCount > 0 && otherCount + nullCount == types.size())) { | |
| udt = createUDT(ExprUDT.EXPR_IP, nullable); | |
| } else if (binaryCount > 0 && binaryCount + nullCount == types.size()) { | |
| udt = createUDT(ExprUDT.EXPR_BINARY, nullable); | |
| } | |
| // There exists a mix of time, date, and timestamp (and optionally null) | |
| else if (binaryCount == 0 && ipCount == 0) { | |
| udt = createUDT(ExprUDT.EXPR_TIMESTAMP, nullable); | |
| } else { | |
| udt = createSqlType(SqlTypeName.VARCHAR, nullable); | |
| } | |
| return udt; |
🤖 Prompt for AI Agents
In
`@core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`
around lines 370 - 388, In OpenSearchTypeFactory, add an explicit early check
for the all-NULL case before the current conditional chain: if nullCount ==
types.size() return the NULL SQL type (or delegate to the existing
leastRestrictive behavior) instead of falling through to the date/time/ip
checks; update the method that builds udt (the block that calls
createUDT(ExprUDT.EXPR_DATE, ...), createUDT(...EXPR_TIME...),
createUDT(...EXPR_IP...), createUDT(...EXPR_BINARY...), and
createSqlType(SqlTypeName.VARCHAR, ...)) to handle the all-null case first so an
all-NULL input does not incorrectly yield ExprUDT.EXPR_DATE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai nope, all null casts won't pass the initial condition if (types.stream().anyMatch(OpenSearchTypeUtil::isUserDefinedType)), thus won't fall into the date UDT case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuancu You’re right—thanks for the correction. Because the UDT branch in OpenSearchTypeFactory.leastRestrictive() executes only when at least one input type is a UDT (via OpenSearchTypeUtil.isUserDefinedType), an all-NULL list won’t enter that block and will fall back to the standard leastRestrictive(types, PplTypeCoercionRule.assignmentInstance()) path. The earlier suggestion about EXPR_DATE on all-NULL was incorrect—no change needed.
As a minor guardrail, consider a unit test asserting the all-NULL case takes the standard path, but otherwise this looks good.
✏️ Learnings added
Learnt from: yuancu
Repo: opensearch-project/sql PR: 4892
File: core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java:370-388
Timestamp: 2026-01-29T06:05:23.585Z
Learning: OpenSearchTypeFactory.leastRestrictive(List<RelDataType>) enters the UDT-specific logic only if any input is a user-defined type (OpenSearchTypeUtil.isUserDefinedType). An all-NULL input list does not enter this branch and uses the standard leastRestrictive with PplTypeCoercionRule.assignmentInstance() instead.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2026-01-26T22:19:54.446Z
Learning: Applies to **/calcite/**/*.java : Ensure version compatibility with Calcite in integration code
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2026-01-26T22:19:54.446Z
Learning: Applies to **/calcite/**/*.java : Document Calcite-specific workarounds in code comments
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2026-01-26T22:19:54.446Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x support
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
393-413: Add complete JavaDoc for these public APIs.
resolve/resolveAgglack JavaDoc, and theregisterExternal*methods don’t include full tag coverage. Please add@param,@return, and@throwsas required.
As per coding guidelines: Public methods MUST have JavaDoc with@param,@return, and@throws.Also applies to: 428-475
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 1899-1913: The current embedExistingCollationsIntoOver logic
always overwrites window ORDER BY keys (rexCollations) when building RexOver,
dropping any explicit window ordering; change the visitOver in
embedExistingCollationsIntoOver (the code that calls context.rexBuilder.makeOver
with rexCollations) to first inspect the RexWindow (window) for existing order
keys and only use the plan collation (rexCollations) when the window has no
explicit ordering, otherwise pass the window's existing order keys through
unchanged so explicit window ORDER BY is preserved.
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`:
- Around line 477-483: The file references UDFOperandMetadata in the
INTERNAL_PATTERN declaration inside PPLBuiltinOperators
(createUserDefinedAggFunction call for LogPatternAggFunction "pattern") but
missing its import causes compilation failure; add the proper import for
UDFOperandMetadata at the top of the file (matching the package where
UDFOperandMetadata is defined) so the reference in the INTERNAL_PATTERN constant
resolves.
- Around line 494-520: Add unit tests covering the MAP_FROM_ARRAYS function
declared as PPLBuiltinOperators.MAP_FROM_ARRAYS: create tests that (1) verify
correct map output when given two arrays of matching length (e.g., keys and
values), (2) assert behavior with empty arrays, (3) assert handling of null
inputs and null elements, and (4) validate behavior when key/value arrays have
mismatched lengths (expected error or defined behavior). Put tests in the same
test suite that covers other PPLBuiltinOperators (reuse existing test utilities
used for ATAN/REGEXP/MAP_CONCAT) and invoke the function through the same
evaluation path used by other function tests so the SqlFunction resolution and
type handling (SqlLibraryOperators -> PPLBuiltinOperators.MAP_FROM_ARRAYS) are
exercised.
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`:
- Around line 414-425: registerExternalAggOperator currently calls
UserDefinedFunctionUtils.makeAggregateCall with List.of(field) which NPEs for
zero-arg aggregates when field is null; modify the AggHandler lambda in
registerExternalAggOperator to pass an empty list (e.g., Collections.emptyList()
or List.of()) when field==null and List.of(field) only when field!=null, then
call makeAggregateCall(aggFunction, fieldList, argList, ctx.relBuilder) so
zero-arg external aggregates (like count()) are handled safely; keep the same
functionName->handler registration in aggExternalFunctionRegistry.
| public RexNode visitOver(RexOver over) { | ||
| RexWindow window = over.getWindow(); | ||
| return context.rexBuilder.makeOver( | ||
| over.getType(), | ||
| over.getAggOperator(), | ||
| over.getOperands(), | ||
| window.partitionKeys, | ||
| rexCollations, | ||
| window.getLowerBound(), | ||
| window.getUpperBound(), | ||
| window.isRows(), | ||
| true, | ||
| false, | ||
| over.isDistinct(), | ||
| over.ignoreNulls()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full context around the visitOver method
cat -n core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | sed -n '1880,1930p'Repository: opensearch-project/sql
Length of output: 2823
🏁 Script executed:
# Search for where rexCollations is defined in the visitOver context
grep -n "rexCollations" core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java | head -20Repository: opensearch-project/sql
Length of output: 180
🏁 Script executed:
# Look for the broader method context to understand how rexCollations is populated
ast-grep --pattern 'visitOver($$$) {
$$$
}'Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for RexWindow.orderKeys references
rg "window\.orderKeys" core/src/main/java/org/opensearch/sql/calcite/Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for RexWindow creation or usage patterns
rg "RexWindow" core/src/main/java/org/opensearch/sql/calcite/ -A 3 -B 1Repository: opensearch-project/sql
Length of output: 14719
🏁 Script executed:
# Look for similar patterns of window handling elsewhere in the file
rg "makeOver\(" core/src/main/java/org/opensearch/sql/calcite/ -B 5 -A 5Repository: opensearch-project/sql
Length of output: 12259
🏁 Script executed:
# Check if there are any imports or usage of RexWindow that show field names
grep -n "import.*RexWindow\|RexWindow\." core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javaRepository: opensearch-project/sql
Length of output: 159
🏁 Script executed:
# Search for window.orderKeys usage in the codebase
rg "\.orderKeys" core/src/main/java/ -t javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Search for window field access patterns
rg "window\.(partition|order|collation)" core/src/main/java/ -t javaRepository: opensearch-project/sql
Length of output: 188
🏁 Script executed:
# Check pom.xml for Calcite version
cat pom.xml | grep -A 2 "calcite" | head -20Repository: opensearch-project/sql
Length of output: 105
🌐 Web query:
Apache Calcite RexWindow class API orderKeys field structure
💡 Result:
In org.apache.calcite.rex.RexWindow, the orderKeys field is:
public final ImmutableList<RexFieldCollation> orderKeys— an ordered list describing the window’s ORDER BY items. [1]
Each element (RexFieldCollation) is structured as:
RexFieldCollation extends Pair<RexNode, ImmutableSet<SqlKind>>- left (
RexNode): the sort expression (e.g., an input ref, computed expression, etc.) - right (
Set<SqlKind>): sort flags such as DESCENDING and null-direction flags (e.g., NULLS FIRST/LAST) [2]
- left (
Convenience accessors on RexFieldCollation interpret those flags as enums:
getDirection()→RelFieldCollation.Direction(ASC/DESC)getNullDirection()→RelFieldCollation.NullDirection[2]
Sources: [1] [2]
🏁 Script executed:
# Search for RexBuilder.makeOver method to see its signature
rg "makeOver\(" --type java -B 2 -A 8 | head -100Repository: opensearch-project/sql
Length of output: 10225
🏁 Script executed:
# Look for patterns where window.orderKeys is checked or used
find . -name "*.java" -type f | xargs grep -l "orderKeys" 2>/dev/null | head -10Repository: opensearch-project/sql
Length of output: 322
🏁 Script executed:
# Search for the method name "embedExistingCollationsIntoOver" and surrounding context
rg "embedExistingCollationsIntoOver" -B 5 -A 30Repository: opensearch-project/sql
Length of output: 8562
🌐 Web query:
RexBuilder.makeOver signature parameter orderKeys Calcite
💡 Result:
In Apache Calcite, RexBuilder.makeOver takes orderKeys after partitionKeys.
Signature (current javadocAggregate):
makeOver(RelDataType type, SqlAggFunction operator, List<RexNode> exprs, List<RexNode> partitionKeys, ImmutableList<RexFieldCollation> orderKeys, RexWindowBound lowerBound, RexWindowBound upperBound, RexWindowExclusion exclude, boolean rows, boolean allowPartial, boolean nullWhenCountZero, boolean distinct, boolean ignoreNulls) [1]
Older/shorter overload (no exclude, no ignoreNulls):
makeOver(RelDataType type, SqlAggFunction operator, List<RexNode> exprs, List<RexNode> partitionKeys, ImmutableList<RexFieldCollation> orderKeys, RexWindowBound lowerBound, RexWindowBound upperBound, boolean physical, boolean allowPartial, boolean nullWhenCountZero, boolean distinct) [2]
So the relevant order is: exprs, partitionKeys, orderKeys, then bounds/flags. [1][2]
Preserve existing window ORDER BY when embedding plan collations.
The embedExistingCollationsIntoOver method unconditionally replaces all RexOver order keys with the plan collation, dropping any explicit window ORDER BY. This can change query semantics. Only inject the plan collation when the window lacks existing order keys.
🛠️ Suggested fix
- return context.rexBuilder.makeOver(
+ ImmutableList<@NonNull RexFieldCollation> orderKeys =
+ window.orderKeys.isEmpty() ? rexCollations : window.orderKeys;
+ return context.rexBuilder.makeOver(
over.getType(),
over.getAggOperator(),
over.getOperands(),
window.partitionKeys,
- rexCollations,
+ orderKeys,
window.getLowerBound(),
window.getUpperBound(),
window.isRows(),
true,
false,
over.isDistinct(),
over.ignoreNulls());🤖 Prompt for AI Agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 1899 - 1913, The current embedExistingCollationsIntoOver logic
always overwrites window ORDER BY keys (rexCollations) when building RexOver,
dropping any explicit window ordering; change the visitOver in
embedExistingCollationsIntoOver (the code that calls context.rexBuilder.makeOver
with rexCollations) to first inspect the RexWindow (window) for existing order
keys and only use the plan collation (rexCollations) when the window has no
explicit ordering, otherwise pass the window's existing order keys through
unchanged so explicit window ORDER BY is preserved.
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Show resolved
Hide resolved
| public void registerExternalAggOperator( | ||
| BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { | ||
| PPLTypeChecker typeChecker = | ||
| wrapSqlOperandTypeChecker(aggFunction.getOperandTypeChecker(), functionName.name(), true); | ||
| CalciteFuncSignature signature = new CalciteFuncSignature(functionName.getName(), typeChecker); | ||
| if (aggExternalFunctionRegistry.containsKey(functionName)) { | ||
| logger.warn( | ||
| String.format( | ||
| Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); | ||
| } | ||
| AggHandler handler = | ||
| (distinct, field, argList, ctx) -> | ||
| UserDefinedFunctionUtils.makeAggregateCall( | ||
| aggFunction, List.of(field), argList, ctx.relBuilder); | ||
| aggExternalFunctionRegistry.put(functionName, Pair.of(signature, handler)); | ||
| } | ||
|
|
||
| public List<RexNode> validateAggFunctionSignature( | ||
| BuiltinFunctionName functionName, | ||
| RexNode field, | ||
| List<RexNode> argList, | ||
| RexBuilder rexBuilder) { | ||
| var implementation = getImplementation(functionName); | ||
| return validateFunctionArgs(implementation, functionName, field, argList, rexBuilder); | ||
| aggExternalFunctionRegistry.put(functionName, handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the file structure and locate the method in question
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || cd .
# Find the file and show context around the method
fd -t f "PPLFuncImpTable.java" --exec cat -n {} \; | head -100Repository: opensearch-project/sql
Length of output: 8647
🏁 Script executed:
#!/bin/bash
# Search for all calls to registerExternalAggOperator
rg -n "registerExternalAggOperator" --type=java -C 5Repository: opensearch-project/sql
Length of output: 3156
🏁 Script executed:
#!/bin/bash
# Search for AggHandler usages to understand how field is passed
rg -n "AggHandler" --type=java -C 3Repository: opensearch-project/sql
Length of output: 13249
🏁 Script executed:
#!/bin/bash
# Look for resolveAgg and similar methods that invoke handlers
rg -n "aggExternalFunctionRegistry|resolveAgg" --type=java -C 5Repository: opensearch-project/sql
Length of output: 10778
🏁 Script executed:
#!/bin/bash
# Look at the built-in aggregate handler implementations in AggBuilder
cat -n core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java | sed -n '996,1050p'Repository: opensearch-project/sql
Length of output: 2934
🏁 Script executed:
#!/bin/bash
# Search for calls to resolveAgg to see how field is passed
rg -n "\.resolveAgg\(" --type=java -B 2 -A 2Repository: opensearch-project/sql
Length of output: 574
🏁 Script executed:
#!/bin/bash
# Search for zero-arg aggregate functions or COUNT() without field
rg -n "COUNT|count\(\s*\)" --type=java | head -30Repository: opensearch-project/sql
Length of output: 4974
🏁 Script executed:
#!/bin/bash
# Look for how COUNT(*) is handled in plan building
rg -n "RexBuilder\|relBuilder.*COUNT\|COUNT.*null" --type=java -B 2 -A 2 | head -50Repository: opensearch-project/sql
Length of output: 48
Guard against null field for external aggregates.
List.of(field) throws NPE when field is null, which occurs for zero-arg aggregates like count(). The built-in COUNT() aggregate explicitly handles this case; external aggregates should follow the same pattern to prevent planner crashes.
🛠️ Suggested fix
- AggHandler handler =
- (distinct, field, argList, ctx) ->
- UserDefinedFunctionUtils.makeAggregateCall(
- aggFunction, List.of(field), argList, ctx.relBuilder);
+ AggHandler handler =
+ (distinct, field, argList, ctx) -> {
+ List<RexNode> fields = field == null ? List.of() : List.of(field);
+ return UserDefinedFunctionUtils.makeAggregateCall(
+ aggFunction, fields, argList, ctx.relBuilder);
+ };🤖 Prompt for AI Agents
In
`@core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`
around lines 414 - 425, registerExternalAggOperator currently calls
UserDefinedFunctionUtils.makeAggregateCall with List.of(field) which NPEs for
zero-arg aggregates when field is null; modify the AggHandler lambda in
registerExternalAggOperator to pass an empty list (e.g., Collections.emptyList()
or List.of()) when field==null and List.of(field) only when field!=null, then
call makeAggregateCall(aggFunction, fieldList, argList, ctx.relBuilder) so
zero-arg external aggregates (like count()) are handled safely; keep the same
functionName->handler registration in aggExternalFunctionRegistry.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java`:
- Line 47: Add unit tests for PplTypeCoercion and add missing JavaDoc: write
focused JUnit tests exercising PplTypeCoercion (class PplTypeCoercion) methods
including null inputs, error paths and boundary cases for datetime and IP
conversions, numeric widening, and SAFE_CAST behavior (including safe/unsafe
results), and assert expected types/results and exceptions; also add JavaDoc to
the PplTypeCoercion constructor and all public methods with clear descriptions
and `@param/`@return (and `@throws` where applicable) to match project standards so
reviewers can see intended behavior and edge-case handling.
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
Show resolved
Hide resolved
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialectTest.java`:
- Around line 30-41: Add two unit tests in OpenSearchSparkSqlDialectTest that
call OpenSearchSparkSqlDialect.DEFAULT.getCastSpec: one that passes a non‑IP
RelDataType (e.g., TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER)) and asserts
the method returns null (or not a SqlDataTypeSpec) for unsupported types, and
another that calls getCastSpec(null) and asserts it throws a
NullPointerException (using assertThrows). Reference the existing test pattern
that inspects SqlDataTypeSpec (castSpec/SqlDataTypeSpec) so the new tests mirror
the IP test but cover non‑IP and null/boundary cases.
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRuleTest.java`:
- Around line 40-50: The test testVarcharToOtherCoercion's comment is incorrect:
it states "coercible from OTHER" while the code checks
mapping.get(SqlTypeName.OTHER) contains VARCHAR/CHAR (i.e., VARCHAR can be
coerced to OTHER); update the inline comment above the assertion to correctly
read that VARCHAR is coercible to OTHER (or reword to "OTHER should allow
coercion to VARCHAR/CHAR") so the comment matches the logic in
PplTypeCoercionRule.instance() and mapping.get(SqlTypeName.OTHER).
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttleTest.java`:
- Around line 33-36: Add unit tests to SkipRelValidationShuttleTest to cover
NULL, empty, and invalid cases for the skip predicate logic: add tests that pass
null operands to the shuttle (e.g., simulate a RexCall or RexNode with a null
operand), tests that pass an empty operand list, and negative tests for invalid
aggregate-input shapes (e.g., a single CASE expression or a non-project input)
to assert the shuttle rejects or throws as expected; create clearly named test
methods (e.g., testSkipPredicateWithNullOperand,
testSkipPredicateWithEmptyOperands, testSkipPredicateWithInvalidAggregateInput)
and exercise the same validation paths used by the existing happy-path tests so
you get NULL/boundary/error coverage for SkipRelValidationShuttle and relevant
methods in that class.
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/ValidationUtilsTest.java`:
- Around line 209-232: Add a unit test in ValidationUtilsTest to cover a
null-exception input to ValidationUtils.tolerantValidationException: create a
test method (e.g., testTolerantValidationExceptionNullException) that passes
null as the exception argument and asserts the expected boolean result (likely
false based on other tests); place it alongside the existing tests in the same
class so the null-exception case is covered per the NULL input testing
guideline.
- Around line 98-207: Add null-input unit tests for
ValidationUtils.createUDTWithAttributes: verify that passing sourceType == null
and typeName == null to both overloads (the one taking ExprUDT and the one
taking SqlTypeName) throws a NullPointerException (use
assertThrows(NullPointerException.class, ...)). Add four tests in
ValidationUtilsTest such as testCreateUDTWithAttributes_NullSourceType_ExprUDT,
testCreateUDTWithAttributes_NullTypeName_ExprUDT,
testCreateUDTWithAttributes_NullSourceType_SqlTypeName, and
testCreateUDTWithAttributes_NullTypeName_SqlTypeName that call
ValidationUtils.createUDTWithAttributes(TYPE_FACTORY or null,
null/ExprUDT.EXPR_DATE) and the SqlTypeName variant accordingly and assert the
expected exception.
🧹 Nitpick comments (3)
core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttleTest.java (2)
36-94: Consider adding NULL input test for identifier visit.Per coding guidelines, NULL input tests should be included for all new functions. Consider adding a test for how
shuttle.visit()handles anullSqlIdentifier input, or document that null inputs are not expected at this layer.`@Test` public void testVisitIdentifierWithNull() { SqlNode result = shuttle.visit((SqlIdentifier) null); // Assert expected behavior (null return, exception, etc.) }
174-189: Consider adding symmetric test for NOT IN with non-SqlNodeList.For completeness and symmetry with the IN tests, consider adding a test that verifies NOT IN with a non-SqlNodeList first operand also remains unchanged.
`@Test` public void testVisitCallNotInWithNonSqlNodeListUnchanged() { SqlIdentifier column = new SqlIdentifier("my_column", SqlParserPos.ZERO); SqlIdentifier subquery = new SqlIdentifier("subquery", SqlParserPos.ZERO); SqlBasicCall notInCall = new SqlBasicCall(SqlStdOperatorTable.NOT_IN, List.of(column, subquery), SqlParserPos.ZERO); SqlNode result = shuttle.visit(notInCall); assertInstanceOf(SqlBasicCall.class, result); SqlBasicCall resultCall = (SqlBasicCall) result; assertEquals(SqlKind.NOT_IN, resultCall.getKind()); assertInstanceOf(SqlIdentifier.class, resultCall.getOperandList().get(0)); }core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRuleTest.java (1)
17-67: Consider adding tests forlenientInstance()andassignmentInstance().The
PplTypeCoercionRuleclass exposes three public factory methods:instance(),lenientInstance(), andassignmentInstance(). Currently, onlyinstance()is tested. Adding similar tests for the other two would ensure the PPL-specific mappings are correctly merged with bothSqlTypeCoercionRule.lenientInstance()andSqlTypeAssignmentRule.instance()base mappings.
core/src/test/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialectTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRuleTest.java
Show resolved
Hide resolved
...src/test/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttleTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/validate/ValidationUtilsTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/validate/ValidationUtilsTest.java
Show resolved
Hide resolved
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In
`@core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java`:
- Around line 173-184: The test method
testLeastRestrictive_ipUdtAndOther_returnsIpUdt is using SqlTypeName.NULL for
the second type which duplicates the IP+NULL scenario; change the second type to
SqlTypeName.OTHER by creating RelDataType otherType =
TYPE_FACTORY.createSqlType(SqlTypeName.OTHER) and pass List.of(ipUdt, otherType)
into TYPE_FACTORY.leastRestrictive so the code exercises the IP+OTHER branch;
update assertions remain the same (use OpenSearchTypeUtil.isIp and
result.isNullable()) and ensure the variable name (otherType) replaces nullType
in that call.
- Around line 20-22: Add an integration-style test method (e.g.,
testLeastRestrictiveViaSqlPipeline) to OpenSearchTypeFactoryTest that drives
leastRestrictive through Calcite's SQL->validation->Rel conversion pipeline:
construct a test schema with the same UDT types used by the existing unit tests,
parse a simple SQL SELECT that uses expressions/columns with those UDTs, run the
SqlValidator (using the test's SqlValidator and
OpenSearchSqlOperatorTable/SqlStdOperatorTable), convert the validated SqlNode
to a RelNode via SqlToRelConverter, and assert that the resulting RelNode types
(or the validator/typeFactory outputs) resolved the common type using
leastRestrictive as expected; reference the class OpenSearchTypeFactoryTest, the
type factory instance used in existing tests, and the Calcite pipeline classes
SqlParser/SqlValidator/SqlToRelConverter to locate where to integrate this new
test.
- Around line 24-278: Add three unit tests to OpenSearchTypeFactoryTest that
cover null/empty inputs for TYPE_FACTORY.leastRestrictive: (1)
testLeastRestrictive_nullInput_returnsNull — call
TYPE_FACTORY.leastRestrictive(null) and assertNull(result); (2)
testLeastRestrictive_listWithNullElement_returnsNull — call
TYPE_FACTORY.leastRestrictive(List.of(TYPE_FACTORY.createUDT(ExprUDT.EXPR_DATE),
null)) and assertNull(result); (3) testLeastRestrictive_emptyList_returnsNull —
call TYPE_FACTORY.leastRestrictive(List.of()) and assertNull(result). Name each
test accordingly and use the existing TYPE_FACTORY and ExprUDT symbols so they
integrate with the current test suite.
In
`@core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionTest.java`:
- Around line 101-113: The test testImplicitCast_preservesNullability currently
checks nullableResult but misses asserting nonNullableResult's nullability;
update the test in PplTypeCoercionTest (method
testImplicitCast_preservesNullability) to assert that nonNullableResult is
non-nullable by adding an assertion like
assertFalse(nonNullableResult.isNullable()) after the existing checks on
nonNullableResult, using the same typeCoercion.implicitCast invocation to verify
non-nullable input yields non-nullable output.
🧹 Nitpick comments (5)
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java (2)
19-45: Consider adding NULL input tests for type-checking methods.Only
isScalar(line 342) includes a NULL input test. Per the coding guidelines, NULL input tests should be included for all new functions. Consider adding NULL input tests forisUserDefinedType,isNumericOrCharacter,isDatetime,isDate,isTimestamp,isTime,isCharacter,isIp, andisBinaryto verify behavior when passednull.Example NULL input tests to add
`@Test` public void testIsUserDefinedType_withNull_returnsFalse() { assertFalse(OpenSearchTypeUtil.isUserDefinedType(null)); } `@Test` public void testIsNumericOrCharacter_withNull_returnsFalse() { assertFalse(OpenSearchTypeUtil.isNumericOrCharacter(null)); } `@Test` public void testIsDatetime_withNull_returnsFalse() { assertFalse(OpenSearchTypeUtil.isDatetime(null)); } // Similar tests for isDate, isTimestamp, isTime, isCharacter, isIp, isBinaryAs per coding guidelines: "NULL input tests must be included for all new functions".
8-10: Add missing import forjava.util.List.
java.util.List.ofis used inline at lines 369 and 372. Consider adding an import for consistency with the other imports.Suggested import
import static org.junit.jupiter.api.Assertions.assertFalse; import static org.junit.jupiter.api.Assertions.assertTrue; import static org.opensearch.sql.calcite.utils.OpenSearchTypeFactory.TYPE_FACTORY; +import java.util.List; import org.apache.calcite.rel.type.RelDataType;Then update lines 369-372:
- java.util.List.of( + List.of( TYPE_FACTORY.createSqlType(SqlTypeName.INTEGER), TYPE_FACTORY.createSqlType(SqlTypeName.VARCHAR)), - java.util.List.of("id", "name")); + List.of("id", "name"));core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java (1)
24-25: Align test method names with the required pattern.Current names use underscores; the guideline expects
test<Functionality><Condition>(camel-cased) naming. Please rename consistently (e.g.,testLeastRestrictiveDateUdtsOnlyReturnsDateUdt).As per coding guidelines, "Test naming should follow pattern test".
core/src/test/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttleTest.java (2)
94-201: Consider adding boundary condition tests for interval literals.Per testing guidelines, boundary condition tests (min/max values, zero, negative inputs) should be included for new functions. The interval tests use specific positive values but don't cover edge cases:
- Zero interval value (e.g.,
INTERVAL 0 DAY)- Negative interval value (if supported)
- Very large interval values near overflow boundaries
These edge cases could reveal issues with arithmetic operations in the transformation logic.
259-294: Test helper duplicates logic rather than testing actual class.The
createRexShuttlehelper reimplements the exact transformation logic fromPplRelToSqlRelShuttle's internalvisitLiteralmethod rather than delegating to the actual class. While this approach isolates and tests the algorithm correctly, it doesn't verify thatPplRelToSqlRelShuttleitself functions as intended—if the production implementation diverges from this test copy, tests would still pass.Consider refactoring to instantiate
PplRelToSqlRelShuttleand invoke its behavior directly, rather than reimplementing its logic inline in tests.
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/calcite/validate/PplTypeCoercionTest.java
Show resolved
Hide resolved
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Description
This PR migrates from our custom type checking mechanism to Apache Calcite's native validation system by introducing a SqlNode validation layer. This addresses the lack of a proper SQL validation phase and resolves compatibility issues with user-defined types (UDTs).
This implements Approach 1 described in #3865.
How It Works
The PR introduces a validation layer in the query execution pipeline:
This approach leverages Calcite's robust type system while preserving OpenSearch's custom type semantics through careful type mapping and restoration.
Benefits
sqrt(x)→pow(x, 0.5)) before generating physical plansWork Items / Implementation Details
Core Infrastructure
deriveType,coerceOperandType, and type inference methodsType System Enhancements
SqlTypeNameenum with OpenSearch UDTs through dynamic type mapping:EXPR_TIME→SqlTypeName.TIME) before validationcommonTypeForBinaryComparisonto enable datetime cross-type comparisons (DATE vs TIME, etc.)SAFE_CASTfor string-to-number conversions to tolerate malformatted data; useCASTfor literal numbersFunction & Operator Handling
array_slice,reduce,mvappend, etc.) with proper type inferencejson_extract,json_set, etc.)atanoverloading, percentile approximations)DISTINCT_COUNT_APPROX,COUNT(*)rewriting, etc.)ADDoperator: String concatenation vs numeric additionATAN: Single-operand vs two-operand versionsGEOIP: String overrides due to UDT erasureSqlKindforDIVIDEandMODUDFsQuery Construct Support
RelHintthrough conversions (addedSqlHintforLogicalAggregate)bucket_nullableflagsIN/NOT INwith tuple inputs via row constructor rewritingDialect & Compatibility
LogicalValuesfor empty row generationEdge Cases & Fixes
SAFE_CASTRelToSqlConverteris instantiated per-use (stateful component)Test Fixes
CalcitePPLDateTimeBuiltinFunctionIT: Interval semanticsCalcitePPLBuiltinFunctionIT: LOG function, sarg deserializationCalciteArrayFunctionIT: Type checkers, reduce function inferenceCalciteMathematicalFunctionIT,CalcitePPLAggregationITCalciteBinCommandIT: Timestamp operations, windowed aggregates in GROUP BYCalciteStreamstatsCommandIT: Sort columns, bucket_nullableCalcitePPLJsonBuiltinFunctionIT: String conversionCode Cleanup
PPLFuncImpTableUDFOperandMetadata.wrapUDTinterfaceCalciteFuncSignature,PPLTypeCheckerEnhancedCoalesceinto built-inCoalesceOptimizations
SAFE_CASTon non-string literal numbers (useCASTfor better performance)dedupoptimization issuesPerformance Impact
Profiled on personal laptop, each test runs twice, then averaged.
Conclusion: No significant performance degradation. ClickBench shows slight overhead (+12.3%) during analyze phase due to validation, but big5 and TPCH show improvements, likely from better query optimization enabled by proper type information.
Related Issues
Resolves #4636, resolves #3865
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.