Skip to content

[SPARK-37438][SQL] ANSI mode: Use store assignment rules for resolving function invocation #34681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Nov 22, 2021

What changes were proposed in this pull request?

Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of Spark SQL:

  • In general, it follows the Store assignment rules as storing the input values as the declared parameter type of the SQL functions
  • Special rules apply for string literals and untyped NULL. A NULL can be promoted to any other type, while a string literal can be promoted to any simple data type.

Why are the changes needed?

Currently, the ANSI SQL mode resolves the function invocation with Least Common Type Resolution based onType precedence list. After a closer look at the ANSI SQL standard, the "store assignment" syntax rules should be used for resolving the type coercion between the input and parameters of SQL function, while the Type precedence list is used for "Subject routine determination"(SQL function overloads).
image

I have also done some data science among real-world SQL queries, the following implicit function casts are not allowed as per Least Common Type Resolution but they are commonly seen:

  • Numeric/Date/Timestamp => String, e.g. tableau generated query CONCAT(DATE_ADD(%1, CAST(%2 AS INT)), SUBSTR(CAST(%1 AS TIMESTAMP), 11)) AS TIMESTAMP)
  • Timestamp => Date, e.g date_sub(now(), 7) < ...
  • Double => Long, e.g. from_unixtime(updated/1000), note that updated and 1000 will be converted as Double first.

The changes in this PR is ANSI compatible and it is good for the adoption of ANSI SQL mode.

Does this PR introduce any user-facing change?

Yes, Use store assignment rules for resolving function invocation under ANSI mode.

How was this patch tested?

Unit tests

@gengliangwang
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49980/

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49980/

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Test build #145508 has finished for PR 34681 at commit 8d0b522.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -348,6 +290,9 @@ object AnsiTypeCoercion extends TypeCoercionBase {
// Skip nodes who's children have not been resolved yet.
case e if !e.childrenResolved => e

case d @ DateAdd(AnyTimestampType(), _) => d.copy(startDate = Cast(d.startDate, DateType))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor these functions to extend ImplicitCastInputTypes later

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49987/

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49987/

@SparkQA
Copy link

SparkQA commented Nov 22, 2021

Test build #145513 has finished for PR 34681 at commit 65a8758.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50000/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50000/

@gengliangwang
Copy link
Member Author

Merging to master

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Test build #145528 has finished for PR 34681 at commit b7d383e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants