Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 30, 2020

What changes were proposed in this pull request?

  1. Replace SimpleDateFormat by FastDateFormat as the legacy formatter of FractionTimestampFormatter.
  2. Optimise LegacyFastTimestampFormatter for java.sql.Timestamp w/o fractional part.

Why are the changes needed?

  1. By default HiveResult.hiveResultString retrieves timestamps values as instances of java.sql.Timestamp, and uses the legacy parser SimpleDateFormat to convert the timestamps to strings. After the fix [SPARK-31254][SQL] Use the current session time zone in HiveResult.toHiveString #28024, the fractional formatter and its companion - legacy formatter SimpleDateFormat are created per every value. By switching from LegacySimpleTimestampFormatter to LegacyFastTimestampFormatter, we can utilize the internal cache of FastDateFormat, and avoid parsing the default pattern yyyy-MM-dd HH:mm:ss.
  2. The second change in the method def format(ts: Timestamp): String of LegacyFastTimestampFormatter is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing tests in TimestampFormatter.

@SparkQA
Copy link

SparkQA commented May 30, 2020

Test build #123317 has finished for PR 28678 at commit 5e050ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [WIP][SQL] Use FastDateFormat as the legacy fractional formatter [SQL] Use FastDateFormat as the legacy fractional formatter May 30, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented May 30, 2020

@cloud-fan @HyukjinKwon @juliuszsompolski Please, review this PR.

@MaxGekk MaxGekk changed the title [SQL] Use FastDateFormat as the legacy fractional formatter [MINOR][SQL] Use FastDateFormat as the legacy fractional formatter May 30, 2020
@MaxGekk MaxGekk changed the title [MINOR][SQL] Use FastDateFormat as the legacy fractional formatter [SPARK-31874][SQL] Use FastDateFormat as the legacy fractional formatter May 30, 2020
TimestampFormatter.defaultPattern,
zoneId,
TimestampFormatter.defaultLocale,
LegacyDateFormats.FAST_DATE_FORMAT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same with Spark 2.4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 2.4 uses SimpleDateFormat but output of FastDateFormat and SimpleDateFormat for the concrete pattern yyyy-MM-dd HH:mm:ss is the same. At least, all our tests show no difference.

@cloud-fan cloud-fan closed this in 47dc332 May 31, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

cloud-fan pushed a commit that referenced this pull request May 31, 2020
…atter

### What changes were proposed in this pull request?
1. Replace `SimpleDateFormat` by `FastDateFormat` as the legacy formatter of `FractionTimestampFormatter`.
2. Optimise `LegacyFastTimestampFormatter` for `java.sql.Timestamp` w/o fractional part.

### Why are the changes needed?
1. By default `HiveResult`.`hiveResultString` retrieves timestamps values as instances of `java.sql.Timestamp`, and uses the legacy parser `SimpleDateFormat` to convert the timestamps to strings. After the fix #28024, the fractional formatter and its companion - legacy formatter `SimpleDateFormat` are created per every value. By switching from `LegacySimpleTimestampFormatter` to `LegacyFastTimestampFormatter`, we can utilize the internal cache of `FastDateFormat`, and avoid parsing the default pattern `yyyy-MM-dd HH:mm:ss`.
2. The second change in the method `def format(ts: Timestamp): String` of `LegacyFastTimestampFormatter` is needed to optimize the formatter for patterns without the fractional part and avoid conversions to microseconds.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing tests in `TimestampFormatter`.

Closes #28678 from MaxGekk/fastdateformat-as-legacy-frac-formatter.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 47dc332)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@HyukjinKwon
Copy link
Member

+1

1 similar comment
@juliuszsompolski
Copy link
Contributor

+1

@MaxGekk MaxGekk deleted the fastdateformat-as-legacy-frac-formatter branch June 5, 2020 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants