Skip to content

[SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp should not depends on local timezone #13784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jun 20, 2016

What changes were proposed in this pull request?

Currently, we use local timezone to parse or format a timestamp (TimestampType), then use Long as the microseconds since epoch UTC.

In from_utc_timestamp() and to_utc_timestamp(), we did not consider the local timezone, they could return different results with different local timezone.

This PR will do the conversion based on human time (in local timezone), it should return same result in whatever timezone. But because the mapping from absolute timestamp to human time is not exactly one-to-one mapping, it will still return wrong result in some timezone (also in the begging or ending of DST).

This PR is kind of the best effort fix. In long term, we should make the TimestampType be timezone aware to fix this totally.

How was this patch tested?

Tested these function in all timezone.

@SparkQA
Copy link

SparkQA commented Jun 20, 2016

Test build #60869 has finished for PR 13784 at commit 5c60bc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Jun 20, 2016

cc @hvanhovell

val tzClass = classOf[TimeZone].getName
ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("GMT");""")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTC? Universal Time Coordinated and Greenwich Mean Time are in practice the same (GMT is a timezone, UTC is not); but lets use one for consistency.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60888 has finished for PR 13784 at commit 4bba902.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val tz = TimeZone.getTimeZone(timeZone)
val offset = tz.getOffset(time / 1000L)
time + offset * 1000L
convertTz(time, TimeZoneGMT, TimeZone.getTimeZone(timeZone))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fromUTCTime, this would result in a little bit overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try to make it correct first. More optimizations are always possible.

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60890 has finished for PR 13784 at commit 659c9fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - thanks! Merging to master/2.0

asfgit pushed a commit that referenced this pull request Jun 22, 2016
…ends on local timezone

## What changes were proposed in this pull request?

Currently, we use local timezone to parse or format a timestamp (TimestampType), then use Long as the microseconds since epoch UTC.

In from_utc_timestamp() and to_utc_timestamp(), we did not consider the local timezone, they could return different results with different local timezone.

This PR will do the conversion based on human time (in local timezone), it should return same result in whatever timezone. But because the mapping from absolute timestamp to human time is not exactly one-to-one mapping, it will still return wrong result in some timezone (also in the begging or ending of DST).

This PR is kind of the best effort fix. In long term, we should make the TimestampType be timezone aware to fix this totally.

## How was this patch tested?

Tested these function in all timezone.

Author: Davies Liu <davies@databricks.com>

Closes #13784 from davies/convert_tz.

(cherry picked from commit 20d411b)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@asfgit asfgit closed this in 20d411b Jun 22, 2016
asfgit pushed a commit that referenced this pull request Oct 20, 2016
…ld not depends on local timezone

## What changes were proposed in this pull request?

Back-port of #13784 to `branch-1.6`

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes #15554 from srowen/SPARK-16078.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Oct 20, 2016
…ld not depends on local timezone

## What changes were proposed in this pull request?

Back-port of apache#13784 to `branch-1.6`

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes apache#15554 from srowen/SPARK-16078.

(cherry picked from commit 82e98f1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants