-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31892][SQL] Disable week-based date filed for parsing #28706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
Outdated
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelperSuite.scala
Show resolved
Hide resolved
@@ -136,6 +136,8 @@ The count of pattern letters determines the format. | |||
During formatting, all valid data will be output even it is in the optional section. | |||
During parsing, the whole section may be missing from the parsed string. | |||
An optional section is started by `[` and ended using `]` (or at the end of the pattern). | |||
|
|||
- Symbols of 'Y', 'W', 'w', 'E', 'u', 'F' can only be used for datetime formatting, e.g. `date_format`. | |||
|
|||
More details for the text style: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but can we embed this section into Text: ...
?
Test build #123429 has finished for PR 28706 at commit
|
Test build #123430 has finished for PR 28706 at commit
|
Test build #123432 has finished for PR 28706 at commit
|
retest this please |
Test build #123440 has finished for PR 28706 at commit
|
Test build #123455 has finished for PR 28706 at commit
|
thanks, merging to master/3.0! |
This PR disables week-based date filed for parsing closes #28674 1. It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at #28674 2. The existing behavior itself in 2.4 is confusing, e.g. ```sql spark-sql> select to_timestamp('1', 'w'); 1969-12-28 00:00:00 spark-sql> select to_timestamp('1', 'u'); 1970-01-05 00:00:00 ``` The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form. And, e.g. ```sql spark-sql> select to_timestamp('2020 2020', 'YYYY yyyy'); 2020-01-01 00:00:00 spark-sql> select to_timestamp('2020 2020', 'yyyy YYYY'); 2019-12-29 00:00:00 spark-sql> select to_timestamp('2020 2020 1', 'YYYY yyyy w'); NULL spark-sql> select to_timestamp('2020 2020 1', 'yyyy YYYY w'); 2019-12-29 00:00:00 ``` I think we don't need to introduce all the weird behavior from Java 3. The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it. 4. Avoiding JDK bugs https://issues.apache.org/jira/browse/SPARK-31880 Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions. more tests added Closes #28706 from yaooqinn/SPARK-31892. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit afe95bd) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR disables week-based date filed for parsing
closes #28674
Why are the changes needed?
It's an un-fixable behavior change to fill the gap between SimpleDateFormat and DateTimeFormater and backward-compatibility for different JDKs.A lot of effort has been made to prove it at [SPARK-31868][SQL] Restore the parsing behaviour week-based-year for 2.4 #28674
The existing behavior itself in 2.4 is confusing, e.g.
The 'u' here seems not to go to the Monday of the first week in week-based form or the first day of the year in non-week-based form but go to the Monday of the second week in week-based form.
And, e.g.
I think we don't need to introduce all the weird behavior from Java
The current test coverage for week-based date fields is almost 0%, which indicates that we've never imagined using it.
Avoiding JDK bugs
https://issues.apache.org/jira/browse/SPARK-31880
Does this PR introduce any user-facing change?
Yes, the 'Y/W/w/u/F/E' pattern cannot be used datetime parsing functions.
How was this patch tested?
more tests added