Skip to content

Commit ff87613

Browse files
committed
[SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp
## What changes were proposed in this pull request? We have an agreement that the behavior of `from/to_utc_timestamp` is corrected, although the function itself doesn't make much sense in Spark: https://issues.apache.org/jira/browse/SPARK-23715 This PR improves the document. ## How was this patch tested? N/A Closes #22543 from cloud-fan/doc. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent f309b28 commit ff87613

File tree

3 files changed

+68
-18
lines changed

3 files changed

+68
-18
lines changed

R/pkg/R/functions.R

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2204,9 +2204,16 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType")
22042204
})
22052205

22062206
#' @details
2207-
#' \code{from_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
2208-
#' time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1'
2209-
#' would yield '2017-07-14 03:40:00.0'.
2207+
#' \code{from_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
2208+
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
2209+
#' timestamp in UTC, and renders that timestamp as a timestamp in the given time zone.
2210+
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
2211+
#' timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
2212+
#' the given timezone.
2213+
#' This function may return confusing result if the input is a string with timezone, e.g.
2214+
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
2215+
#' timestamp according to the timezone in the string, and finally display the result by converting
2216+
#' the timestamp to string according to the session local timezone.
22102217
#'
22112218
#' @rdname column_datetime_diff_functions
22122219
#'
@@ -2262,9 +2269,16 @@ setMethod("next_day", signature(y = "Column", x = "character"),
22622269
})
22632270

22642271
#' @details
2265-
#' \code{to_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
2266-
#' time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1'
2267-
#' would yield '2017-07-14 01:40:00.0'.
2272+
#' \code{to_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
2273+
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
2274+
#' timestamp in the given timezone, and renders that timestamp as a timestamp in UTC.
2275+
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
2276+
#' timezone-agnostic. So in Spark this function just shift the timestamp value from the given
2277+
#' timezone to UTC timezone.
2278+
#' This function may return confusing result if the input is a string with timezone, e.g.
2279+
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
2280+
#' timestamp according to the timezone in the string, and finally display the result by converting
2281+
#' the timestamp to string according to the session local timezone.
22682282
#'
22692283
#' @rdname column_datetime_diff_functions
22702284
#' @aliases to_utc_timestamp to_utc_timestamp,Column,character-method

python/pyspark/sql/functions.py

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1283,9 +1283,18 @@ def unix_timestamp(timestamp=None, format='yyyy-MM-dd HH:mm:ss'):
12831283
@since(1.5)
12841284
def from_utc_timestamp(timestamp, tz):
12851285
"""
1286-
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders
1287-
that time as a timestamp in the given time zone. For example, 'GMT+1' would yield
1288-
'2017-07-14 03:40:00.0'.
1286+
This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function
1287+
takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and
1288+
renders that timestamp as a timestamp in the given time zone.
1289+
1290+
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
1291+
timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
1292+
the given timezone.
1293+
1294+
This function may return confusing result if the input is a string with timezone, e.g.
1295+
'2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp
1296+
according to the timezone in the string, and finally display the result by converting the
1297+
timestamp to string according to the session local timezone.
12891298
12901299
:param timestamp: the column that contains timestamps
12911300
:param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc
@@ -1308,9 +1317,18 @@ def from_utc_timestamp(timestamp, tz):
13081317
@since(1.5)
13091318
def to_utc_timestamp(timestamp, tz):
13101319
"""
1311-
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time
1312-
zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
1313-
'2017-07-14 01:40:00.0'.
1320+
This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function
1321+
takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in the given
1322+
timezone, and renders that timestamp as a timestamp in UTC.
1323+
1324+
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
1325+
timezone-agnostic. So in Spark this function just shift the timestamp value from the given
1326+
timezone to UTC timezone.
1327+
1328+
This function may return confusing result if the input is a string with timezone, e.g.
1329+
'2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp
1330+
according to the timezone in the string, and finally display the result by converting the
1331+
timestamp to string according to the session local timezone.
13141332
13151333
:param timestamp: the column that contains timestamps
13161334
:param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1018,9 +1018,18 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S
10181018
}
10191019

10201020
/**
1021-
* Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders
1022-
* that time as a timestamp in the given time zone. For example, 'GMT+1' would yield
1023-
* '2017-07-14 03:40:00.0'.
1021+
* This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function
1022+
* takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and
1023+
* renders that timestamp as a timestamp in the given time zone.
1024+
*
1025+
* However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
1026+
* timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
1027+
* the given timezone.
1028+
*
1029+
* This function may return confusing result if the input is a string with timezone, e.g.
1030+
* '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp
1031+
* according to the timezone in the string, and finally display the result by converting the
1032+
* timestamp to string according to the session local timezone.
10241033
*/
10251034
// scalastyle:off line.size.limit
10261035
@ExpressionDescription(
@@ -1215,9 +1224,18 @@ case class MonthsBetween(
12151224
}
12161225

12171226
/**
1218-
* Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone,
1219-
* and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield
1220-
* '2017-07-14 01:40:00.0'.
1227+
* This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function
1228+
* takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in the given
1229+
* timezone, and renders that timestamp as a timestamp in UTC.
1230+
*
1231+
* However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
1232+
* timezone-agnostic. So in Spark this function just shift the timestamp value from the given
1233+
* timezone to UTC timezone.
1234+
*
1235+
* This function may return confusing result if the input is a string with timezone, e.g.
1236+
* '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp
1237+
* according to the timezone in the string, and finally display the result by converting the
1238+
* timestamp to string according to the session local timezone.
12211239
*/
12221240
// scalastyle:off line.size.limit
12231241
@ExpressionDescription(

0 commit comments

Comments
 (0)