Skip to content

[SPARK-31892][SQL] Disable week-based date filed for parsing #28706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[SPARK-31892][SQL] Disable week-based date filed for parsing #28706

wants to merge 4 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jun 2, 2020

What changes were proposed in this pull request?

This PR disables week-based date filed for parsing

closes #28674

Why are the changes needed?

  1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at [SPARK-31868][SQL] Restore the parsing behaviour week-based-year for 2.4 #28674

  2. The existing behavior itself in 2.4 is confusing, e.g.

spark-sql> select to_timestamp('1', 'w');
1969-12-28 00:00:00
spark-sql> select to_timestamp('1', 'u');
1970-01-05 00:00:00

The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form.

And, e.g.

spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy');
2020-01-01 00:00:00
spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY');
2019-12-29 00:00:00
spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w');
NULL
spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w');
2019-12-29 00:00:00

I think we don't need to introduce all the weird behavior from Java

  1. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it.

  2. Avoiding JDK bugs

https://issues.apache.org/jira/browse/SPARK-31880

Does this PR introduce any user-facing change?

Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions.

How was this patch tested?

more tests added

@@ -136,6 +136,8 @@ The count of pattern letters determines the format.
During formatting, all valid data will be output even it is in the optional section.
During parsing, the whole section may be missing from the parsed string.
An optional section is started by `[` and ended using `]` (or at the end of the pattern).

- Symbols of 'Y', 'W', 'w', 'E', 'u', 'F' can only be used for datetime formatting, e.g. `date_format`.

More details for the text style:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but can we embed this section into Text: ...?

@SparkQA
Copy link

SparkQA commented Jun 2, 2020

Test build #123429 has finished for PR 28706 at commit 12d2757.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2020

Test build #123430 has finished for PR 28706 at commit b0413d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2020

Test build #123432 has finished for PR 28706 at commit c364ff3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jun 2, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jun 2, 2020

Test build #123440 has finished for PR 28706 at commit c364ff3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123455 has finished for PR 28706 at commit 419a685.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in afe95bd Jun 3, 2020
cloud-fan pushed a commit that referenced this pull request Jun 3, 2020
This PR disables week-based date filed for parsing

closes #28674

1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at #28674

2. The existing behavior itself in 2.4 is confusing, e.g.

```sql
spark-sql> select to_timestamp('1', 'w');
1969-12-28 00:00:00
spark-sql> select to_timestamp('1', 'u');
1970-01-05 00:00:00
```
  The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form.

And, e.g.
```sql
spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy');
2020-01-01 00:00:00
spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY');
2019-12-29 00:00:00
spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w');
NULL
spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w');
2019-12-29 00:00:00
```

  I think we don't need to introduce all the weird behavior from Java

3. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it.

4. Avoiding JDK bugs

https://issues.apache.org/jira/browse/SPARK-31880

Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions.

more tests added

Closes #28706 from yaooqinn/SPARK-31892.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit afe95bd)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants