Skip to content

[SPARK-33469][SQL] Add current_timezone function #30400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Nov 17, 2020

What changes were proposed in this pull request?

Add a CurrentTimeZone function and replace the value at Optimizer side.

Why are the changes needed?

Let user get current timezone easily. Then user can call

SELECT current_timezone() 

Presto: https://prestodb.io/docs/current/functions/datetime.html
SQL Server: https://docs.microsoft.com/en-us/sql/t-sql/functions/current-timezone-transact-sql?view=sql-server-ver15

Does this PR introduce any user-facing change?

Yes, a new function.

How was this patch tested?

Add test.

@github-actions github-actions bot added the SQL label Nov 17, 2020
@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35832/

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Test build #131229 has finished for PR 30400 at commit 2e48507.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CurrentTimeZone() extends LeafExpression with Unevaluable

@SparkQA
Copy link

SparkQA commented Nov 17, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35832/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ulysses-you . Is the function name current_timezone popular in the other DBMSs?

@ulysses-you
Copy link
Contributor Author

@dongjoon-hyun yes, presto has this function.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35850/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35850/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131247 has finished for PR 30400 at commit 73f71c5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

@@ -73,6 +73,21 @@ trait TimestampFormatterHelper extends TimeZoneAwareExpression {
}
}

@ExpressionDescription(
usage = "_FUNC_() - Returns the current timezone.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say that it's a session local timezone explicitly?

test("SPARK-33469: Add current_timezone function") {
val df = Seq(1).toDF("c")
val timezone = df.selectExpr("current_timezone()").collect().head.getString(0)
assert(timezone == SQLConf.get.sessionLocalTimeZone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, is it possible to set sessionLocalTimeZone explicitly and check with it?

@@ -150,6 +150,7 @@ class ExpressionInfoSuite extends SparkFunSuite with SharedSparkSession {
"org.apache.spark.sql.catalyst.expressions.CurrentDate",
"org.apache.spark.sql.catalyst.expressions.CurrentTimestamp",
"org.apache.spark.sql.catalyst.expressions.Now",
"org.apache.spark.sql.catalyst.expressions.CurrentTimeZone",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add this after CurrentTimestamp like the following?

"org.apache.spark.sql.catalyst.expressions.CurrentTimestamp",
"org.apache.spark.sql.catalyst.expressions.CurrentTimeZone",
"org.apache.spark.sql.catalyst.expressions.Now",

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36041/

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the motivations for adding the new function to public API.

Let user get current timezone easily.

At the moment, users can get the session time zone from the SQL config spark.sql.session.timeZone. It doesn't seem to be so difficult, doesn't it?

Presto, SQL Server ...

As far as I know Spark doesn't declare to be compatible to those systems.

What is the real use case when the function is preferable over reading the SQL config?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 20, 2020

Good point, @MaxGekk . Could you give us your use cases, @ulysses-you ?

IMO, in the Spark SQL world (Spark Thrift Server or Spark SQL Shell), the following is the only way.

spark-sql> set spark.sql.session.timeZone;
spark.sql.session.timeZone	America/Los_Angeles
Time taken: 0.034 seconds, Fetched 1 row(s)

In other words, we cannot use it in Spark SQL DML statements. I guess you have some use cases for that.

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36041/

@SparkQA
Copy link

SparkQA commented Nov 20, 2020

Test build #131435 has finished for PR 30400 at commit 73f71c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

ulysses-you commented Nov 21, 2020

@dongjoon-hyun @MaxGekk The difference between config and function, I think the later can calculate with other data. That says we can do some compare or save the data into storage engine. We introduce many date-related function in recent version, then the timezone is important.

In other words, we cannot use it in Spark SQL DML statements. I guess you have some use cases for that.

ah, actually I have not seen some case with it and just consider we may need it sometime.

We can also show the offset as @maropu said that make it different with set spark.sql.session.timeZone.

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36062/

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36062/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me and @ulysses-you know your final opinion, @maropu and @MaxGekk .
Although this looks good enough for me, I'll leave this PR open for your opinions. Please feel free to comments any final opinions.

Also, cc @cloud-fan .

@SparkQA
Copy link

SparkQA commented Nov 21, 2020

Test build #131456 has finished for PR 30400 at commit b591ff7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Could you rebase this, @ulysses-you ? There was a bug fix for GitHub Action Hive test failure in master branch.

@maropu
Copy link
Member

maropu commented Nov 21, 2020

Since this feature might be useful for SQL users, adding it looks fine to me.

@SparkQA
Copy link

SparkQA commented Nov 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36100/

@SparkQA
Copy link

SparkQA commented Nov 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36100/

@SparkQA
Copy link

SparkQA commented Nov 22, 2020

Test build #131496 has finished for PR 30400 at commit c83da46.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ExecutorSource(
  • case class GetShufflePushMergerLocations(numMergersNeeded: Int, hostsToFilter: Set[String])
  • case class RemoveShufflePushMergerLocation(host: String) extends ToBlockManagerMaster
  • abstract class LikeAllBase extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
  • case class LikeAll(child: Expression, patterns: Seq[UTF8String]) extends LikeAllBase
  • case class NotLikeAll(child: Expression, patterns: Seq[UTF8String]) extends LikeAllBase
  • case class ParseUrl(children: Seq[Expression], failOnError: Boolean = SQLConf.get.ansiEnabled)
  • implicit class MetadataColumnsHelper(metadata: Array[MetadataColumn])

@dongjoon-hyun
Copy link
Member

Thank you, @ulysses-you and @maropu and @MaxGekk .
Merged to master for Apache Spark 3.1.0.

@ulysses-you
Copy link
Contributor Author

thanks for merging !

@ulysses-you ulysses-you deleted the SPARK-33469 branch November 22, 2021 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants