Skip to content

[SPARK-31254][SQL] Use the current session time zone in HiveResult.toHiveString #28024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 25, 2020

What changes were proposed in this pull request?

In the PR, I propose to define timestampFormatter, dateFormatter and zoneId as methods of the HiveResult object. This should guarantee that the formatters pick the current session time zone in toHiveString()

Why are the changes needed?

Currently, date/timestamp formatters in HiveResult.toHiveString are initialized once on instantiation of the HiveResult object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

By existing test suites, in particular, by HiveResultSuite

@MaxGekk MaxGekk changed the title [SQL][SPARK-31254] Use the current session time zone in HiveResult.toHiveString [SPARK-31254][SQL] Use the current session time zone in HiveResult.toHiveString Mar 25, 2020
@SparkQA
Copy link

SparkQA commented Mar 26, 2020

Test build #120372 has finished for PR 28024 at commit 74b98f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private lazy val timestampFormatter = TimestampFormatter.getFractionFormatter(zoneId)
private def zoneId = DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone)
private def dateFormatter = DateFormatter(zoneId)
private def timestampFormatter = TimestampFormatter.getFractionFormatter(zoneId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk, which codes path access SQLConf.get here? Seems like we should clarify in the documentation that we should take other sessions into account since TimestampFormatter behaviours can be dependent on SQL configuration when this instance is created..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's SQLConf.get.sessionLocalTimeZone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, damn, sure. Yes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 600319d Mar 26, 2020
cloud-fan pushed a commit that referenced this pull request Mar 26, 2020
…oHiveString`

### What changes were proposed in this pull request?
In the PR, I propose to define `timestampFormatter`, `dateFormatter` and `zoneId` as methods of the `HiveResult` object. This should guarantee that the formatters pick the current session time zone in `toHiveString()`

### Why are the changes needed?
Currently, date/timestamp formatters in `HiveResult.toHiveString` are initialized once on instantiation of the `HiveResult` object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
By existing test suites, in particular, by `HiveResultSuite`

Closes #28024 from MaxGekk/hive-result-datetime-formatters.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 600319d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…oHiveString`

### What changes were proposed in this pull request?
In the PR, I propose to define `timestampFormatter`, `dateFormatter` and `zoneId` as methods of the `HiveResult` object. This should guarantee that the formatters pick the current session time zone in `toHiveString()`

### Why are the changes needed?
Currently, date/timestamp formatters in `HiveResult.toHiveString` are initialized once on instantiation of the `HiveResult` object, and pick up the session time zone. If the sessions time zone is changed, the formatters still use the previous one.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
By existing test suites, in particular, by `HiveResultSuite`

Closes apache#28024 from MaxGekk/hive-result-datetime-formatters.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 31, 2020
…atter

### What changes were proposed in this pull request?
1. Replace `SimpleDateFormat` by `FastDateFormat` as the legacy formatter of `FractionTimestampFormatter`.
2. Optimise `LegacyFastTimestampFormatter` for `java.sql.Timestamp` w/o fractional part.

### Why are the changes needed?
1. By default `HiveResult`.`hiveResultString` retrieves timestamps values as instances of `java.sql.Timestamp`, and uses the legacy parser `SimpleDateFormat` to convert the timestamps to strings. After the fix #28024, the fractional formatter and its companion - legacy formatter `SimpleDateFormat` are created per every value. By switching from `LegacySimpleTimestampFormatter` to `LegacyFastTimestampFormatter`, we can utilize the internal cache of `FastDateFormat`, and avoid parsing the default pattern `yyyy-MM-dd HH:mm:ss`.
2. The second change in the method `def format(ts: Timestamp): String` of `LegacyFastTimestampFormatter` is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing tests in `TimestampFormatter`.

Closes #28678 from MaxGekk/fastdateformat-as-legacy-frac-formatter.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 31, 2020
…atter

### What changes were proposed in this pull request?
1. Replace `SimpleDateFormat` by `FastDateFormat` as the legacy formatter of `FractionTimestampFormatter`.
2. Optimise `LegacyFastTimestampFormatter` for `java.sql.Timestamp` w/o fractional part.

### Why are the changes needed?
1. By default `HiveResult`.`hiveResultString` retrieves timestamps values as instances of `java.sql.Timestamp`, and uses the legacy parser `SimpleDateFormat` to convert the timestamps to strings. After the fix #28024, the fractional formatter and its companion - legacy formatter `SimpleDateFormat` are created per every value. By switching from `LegacySimpleTimestampFormatter` to `LegacyFastTimestampFormatter`, we can utilize the internal cache of `FastDateFormat`, and avoid parsing the default pattern `yyyy-MM-dd HH:mm:ss`.
2. The second change in the method `def format(ts: Timestamp): String` of `LegacyFastTimestampFormatter` is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing tests in `TimestampFormatter`.

Closes #28678 from MaxGekk/fastdateformat-as-legacy-frac-formatter.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 47dc332)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the hive-result-datetime-formatters branch June 5, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants