-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31254][SQL] Use the current session time zone in HiveResult.toHiveString
#28024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HiveResult.toHiveString
HiveResult.toHiveString
Test build #120372 has finished for PR 28024 at commit
|
private lazy val timestampFormatter = TimestampFormatter.getFractionFormatter(zoneId) | ||
private def zoneId = DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone) | ||
private def dateFormatter = DateFormatter(zoneId) | ||
private def timestampFormatter = TimestampFormatter.getFractionFormatter(zoneId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk, which codes path access SQLConf.get
here? Seems like we should clarify in the documentation that we should take other sessions into account since TimestampFormatter
behaviours can be dependent on SQL configuration when this instance is created..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's SQLConf.get.sessionLocalTimeZone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, damn, sure. Yes.
thanks, merging to master/3.0! |
…oHiveString` ### What changes were proposed in this pull request? In the PR, I propose to define `timestampFormatter`, `dateFormatter` and `zoneId` as methods of the `HiveResult` object. This should guarantee that the formatters pick the current session time zone in `toHiveString()` ### Why are the changes needed? Currently, date/timestamp formatters in `HiveResult.toHiveString` are initialized once on instantiation of the `HiveResult` object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? By existing test suites, in particular, by `HiveResultSuite` Closes #28024 from MaxGekk/hive-result-datetime-formatters. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 600319d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…oHiveString` ### What changes were proposed in this pull request? In the PR, I propose to define `timestampFormatter`, `dateFormatter` and `zoneId` as methods of the `HiveResult` object. This should guarantee that the formatters pick the current session time zone in `toHiveString()` ### Why are the changes needed? Currently, date/timestamp formatters in `HiveResult.toHiveString` are initialized once on instantiation of the `HiveResult` object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? By existing test suites, in particular, by `HiveResultSuite` Closes apache#28024 from MaxGekk/hive-result-datetime-formatters. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…atter ### What changes were proposed in this pull request? 1. Replace `SimpleDateFormat` by `FastDateFormat` as the legacy formatter of `FractionTimestampFormatter`. 2. Optimise `LegacyFastTimestampFormatter` for `java.sql.Timestamp` w/o fractional part. ### Why are the changes needed? 1. By default `HiveResult`.`hiveResultString` retrieves timestamps values as instances of `java.sql.Timestamp`, and uses the legacy parser `SimpleDateFormat` to convert the timestamps to strings. After the fix #28024, the fractional formatter and its companion - legacy formatter `SimpleDateFormat` are created per every value. By switching from `LegacySimpleTimestampFormatter` to `LegacyFastTimestampFormatter`, we can utilize the internal cache of `FastDateFormat`, and avoid parsing the default pattern `yyyy-MM-dd HH:mm:ss`. 2. The second change in the method `def format(ts: Timestamp): String` of `LegacyFastTimestampFormatter` is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing tests in `TimestampFormatter`. Closes #28678 from MaxGekk/fastdateformat-as-legacy-frac-formatter. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…atter ### What changes were proposed in this pull request? 1. Replace `SimpleDateFormat` by `FastDateFormat` as the legacy formatter of `FractionTimestampFormatter`. 2. Optimise `LegacyFastTimestampFormatter` for `java.sql.Timestamp` w/o fractional part. ### Why are the changes needed? 1. By default `HiveResult`.`hiveResultString` retrieves timestamps values as instances of `java.sql.Timestamp`, and uses the legacy parser `SimpleDateFormat` to convert the timestamps to strings. After the fix #28024, the fractional formatter and its companion - legacy formatter `SimpleDateFormat` are created per every value. By switching from `LegacySimpleTimestampFormatter` to `LegacyFastTimestampFormatter`, we can utilize the internal cache of `FastDateFormat`, and avoid parsing the default pattern `yyyy-MM-dd HH:mm:ss`. 2. The second change in the method `def format(ts: Timestamp): String` of `LegacyFastTimestampFormatter` is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By existing tests in `TimestampFormatter`. Closes #28678 from MaxGekk/fastdateformat-as-legacy-frac-formatter. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 47dc332) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to define
timestampFormatter
,dateFormatter
andzoneId
as methods of theHiveResult
object. This should guarantee that the formatters pick the current session time zone intoHiveString()
Why are the changes needed?
Currently, date/timestamp formatters in
HiveResult.toHiveString
are initialized once on instantiation of theHiveResult
object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one.Does this PR introduce any user-facing change?
Yes
How was this patch tested?
By existing test suites, in particular, by
HiveResultSuite