Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Mar 17, 2025

Description

Support ppl CAST function with Calcite

Related Issues

Resolves #3417

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
INTERVAL(ExprCoreType.INTERVAL);
INTERVAL(ExprCoreType.INTERVAL),

IP(ExprCoreType.IP);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast_to_IP unsupported yet since IP type is not supported with Calcite. But this is a missing code in DataType class.

Comment on lines 369 to 372
RelDataType type = context.rexBuilder.getTypeFactory().createSqlType(sqlTypeName);
RelDataType nullableType =
context.rexBuilder.getTypeFactory().createTypeWithNullability(type, true);
return context.rexBuilder.makeCast(SqlParserPos.ZERO, nullableType, expr, false, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this 4 lines instead of context.relbuilder.cast() because we need the last parameter to true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Calcite has same Data Type Conversion rule? anything we want to call-out and update doc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add more test for this table


@Test
public void testCastToUnknownType() {
assertThrows(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in V2, the cast function has a convertedType list.

convertedDataType
: typeName = DATE
| typeName = TIME
| typeName = TIMESTAMP
| typeName = INT
| typeName = INTEGER
| typeName = DOUBLE
| typeName = LONG
| typeName = FLOAT
| typeName = STRING
| typeName = BOOLEAN
| typeName = IP
| typeName = JSON
;

Any unsupported type will throw SyntaxCheckException, no matter what the message is.

"source=%s | eval a = cast(state as string) | fields a",
TEST_INDEX_STATE_COUNTRY_WITH_NULL));

assertEquals(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not verifyDataRows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to check null row with verifyDataRows
The method rows() cannot work with null.

String actual =
execute(
String.format(
"source=%s | eval a = cast(name as boolean) | fields a", TEST_INDEX_STATE_COUNTRY));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast("1" as boolean) should return True?

Copy link
Member Author

@LantaoJin LantaoJin Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, only 'true', 'True', 'TRUE' etc could return True in cast_to_boolean.
cast(1 as boolean), cast(2.5 as boolean) are all return false and cast("1" as boolean) return null.

This seems a breaking change comparing to V2:
cast("1" as boolean) returns false
cast(1 as boolean), cast(2.5 as boolean) return true
cast(0 as boolean) returns false

Copy link
Member Author

@LantaoJin LantaoJin Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Spark:
select cast(1 as boolean); -- true
select cast(2 as boolean); -- true
select cast(0 as boolean); -- false

select cast('1' as boolean); -- true
select cast('2' as boolean); -- null
select cast('0' as boolean); -- false

select cast('aa' as boolean); -- null

Em, I think this is a bug of Calcite, we can fix it in our side.

Copy link
Member Author

@LantaoJin LantaoJin Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open #3443 for discussion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit follows the behaviour of SparkSQL. Check above

Comment on lines 369 to 372
RelDataType type = context.rexBuilder.getTypeFactory().createSqlType(sqlTypeName);
RelDataType nullableType =
context.rexBuilder.getTypeFactory().createTypeWithNullability(type, true);
return context.rexBuilder.makeCast(SqlParserPos.ZERO, nullableType, expr, false, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Calcite has same Data Type Conversion rule? anything we want to call-out and update doc?

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin
Copy link
Member Author

LantaoJin commented Mar 18, 2025

does Calcite has same Data Type Conversion rule? anything we want to call-out and update doc?

Call-out:

  1. cast_to_boolean behaviour is different with V2. But we can discuss which would we prefer to, see [BUG] Cast_to_boolean behaviour is different in OpenSearch SQL and Spark SQL #3443
  2. safe-cast behaviour is different with V2. Cast will return null instead of throwing exception if cast fails. But we can discuss which would we prefer to. (Spark/postgre use safe-cast)
  3. cast boolean to float/double trigger a Calcite bug. See the new tests added in CalcitePPLCastFunctionIT

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix IT before merge.

Signed-off-by: Lantao Jin <ltjin@amazon.com>
@LantaoJin LantaoJin merged commit 0022a31 into opensearch-project:feature/calcite-engine Mar 19, 2025
19 of 20 checks passed
@LantaoJin
Copy link
Member Author

Failure caused by a flaky test CalcitePPLStringBuiltinFunctionIT > testPosition FAILED, Merging to dev branch.

penghuo pushed a commit that referenced this pull request Jun 16, 2025
* init commit

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support ppl CAST function with Calcite

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add more tests

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix ci issue

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants