-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-26002][SQL] Fix day of year calculation for Julian calendar days #23000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #98682 has finished for PR 23000 at commit
|
ping @10110346 @gatorsmile |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Outdated
Show resolved
Hide resolved
lgtm |
Test build #99054 has finished for PR 23000 at commit
|
Retest this please. |
Test build #100216 has finished for PR 23000 at commit
|
cc @cloud-fan , @gatorsmile , @mgaido91 |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Show resolved
Hide resolved
what was the behavior before this PR? |
@cloud-fan It was off by one (at least at 1500, with each 100 years back in time the difference increases) as I described in the Jira:
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Show resolved
Hide resolved
cc @MaxGekk who is touching this |
@gatorsmile Thanx for the ping. Actually current implementation support Proleptic Gregorian calendar even if it is not documented explicitly. Recently we switched on this chronology for parsing/formatting date/timestamps in JSON/CSV datasource (see #23150 and #23196) and in functions (see #23358). If we support the hybrid calendar (Julian + Gregorian) here, it will introduce additional inconsistency. @attilapiros If you need the hybrid calendar for some reasons, you should extend the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, SQL standard counts dates exclusively in the Gregorian calendar, even for years before that calendar was in use.
@MaxGekk thanks for the explanation, but I still a bit uncertain. My main intention was about correcting the dayOfYear/month/year operators as the result in the following example are very strange (and not consistent with PostgreSQL too):
Running the corresponding expressions in PostgreSQL 9.6:
Is there any other DBMS I can take as a reference? |
I see, we still use hybrid calendar in casting spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala Lines 414 to 430 in 7c7fccf
by using java.time API based on Proleptic Gregorian Calendar but such changes look like more heavy. I would guess they will be done to follow SQL standard in the end, so, your fix for Julian calendar will be not needed. At the moment, I agree we need your fix to handle results from stringToTimestamp properly.
|
Ok. Then tomorrow I will add the requested tests and we can progress with this PR. |
Test build #100692 has finished for PR 23000 at commit
|
gentle ping @dongjoon-hyun, @cloud-fan, @gatorsmile |
thanks, merging to master! |
## What changes were proposed in this pull request? Fixing leap year calculations for date operators (year/month/dayOfYear) where the Julian calendars are used (before 1582-10-04). In a Julian calendar every years which are multiples of 4 are leap years (there is no extra exception for years multiples of 100). ## How was this patch tested? With a unit test ("SPARK-26002: correct day of year calculations for Julian calendar years") which focuses to these corner cases. Manually: ``` scala> sql("select year('1500-01-01')").show() +------------------------------+ |year(CAST(1500-01-01 AS DATE))| +------------------------------+ | 1500| +------------------------------+ scala> sql("select dayOfYear('1100-01-01')").show() +-----------------------------------+ |dayofyear(CAST(1100-01-01 AS DATE))| +-----------------------------------+ | 1| +-----------------------------------+ ``` Closes apache#23000 from attilapiros/julianOffByDays. Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Fixing leap year calculations for date operators (year/month/dayOfYear) where the Julian calendars are used (before 1582-10-04). In a Julian calendar every years which are multiples of 4 are leap years (there is no extra exception for years multiples of 100).
How was this patch tested?
With a unit test ("SPARK-26002: correct day of year calculations for Julian calendar years") which focuses to these corner cases.
Manually: