-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31959][SQL][test-java11] Fix Gregorian-Julian micros rebasing while switching standard time zone offset #28787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123775 has finished for PR 28787 at commit
|
|
The build #28787 (comment) failed on the assert for JDK calls: val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0)
val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
assert(earlierMicros + MICROS_PER_HOUR === laterMicros)because jenkins uses "old" JDK: which has an outdated time zone database, see https://bugs.openjdk.java.net/browse/JDK-8228469 @cloud-fan @HyukjinKwon @dongjoon-hyun @srowen Can we upgrade JDK on jenkins machines? |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
Show resolved
Hide resolved
|
jenkins, retest this, please |
|
Test build #123836 has finished for PR 28787 at commit
|
|
Test build #123846 has finished for PR 28787 at commit
|
|
More context: To do datetime rebase more efficiently, we put the diffs between two calendars in JSON files, to avoid calculating the diffs every time. To generate the JSON file, we run the rebase function for all the datetime values from year 0001 to year 2037, using one week as the step. This is not accurate and we miss diffs during some short period. The good news is: although the JSON files do not match the rebase function, they are correct. The rebase function has a bug. We have a random test to make sure the rebase function matches the generated JSON files, and that's why we found this bug. That said, this is not a real bug as it doesn't affect end users, but it makes the test flaky. |
|
thanks, merging to master! |
…while switching standard time zone offset Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping: 1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp. 2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise 3. Set time zone offsets to Julian timestamp from the next day. Note: The fix assumes that transitions cannot happen more often than once per 2 days. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well. It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly. 1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`. 2. Added new test to `RebaseDateTimeSuite` 3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected. Closes apache#28787 from MaxGekk/HongKong-tz-1945. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit c259844) Signed-off-by: Max Gekk <max.gekk@gmail.com>
|
Hi, All. As I commented on the backporting PR, this causes a weird consistent failure on |
…s pure
### What changes were proposed in this pull request?
1. Set the given time zone as the first parameter of `RebaseDateTime`.`rebaseJulianToGregorianMicros()` and `rebaseGregorianToJulianMicros()` to Java 7 `GregorianCalendar`.
```scala
val cal = new Calendar.Builder()
// `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems
.setCalendarType("gregory")
...
.setTimeZone(tz)
.build()
```
This makes the instance of the calendar independent from the default JVM time zone.
2. Change type of the first parameter from `ZoneId` to `TimeZone`. This allows to avoid unnecessary conversion from `TimeZone` to `ZoneId`, for example in
```scala
def rebaseJulianToGregorianMicros(micros: Long): Long = {
...
if (rebaseRecord == null || micros < rebaseRecord.switches(0)) {
rebaseJulianToGregorianMicros(timeZone.toZoneId, micros)
```
and back to `TimeZone` inside of `rebaseJulianToGregorianMicros(zoneId: ZoneId, ...)`
3. Modify tests in `RebaseDateTimeSuite`, and set the default JVM time zone only for functions that depend on it.
### Why are the changes needed?
1. Ignoring passed parameter and using a global variable is bad practice.
2. Dependency from the global state doesn't allow to run the functions in parallel otherwise there is non-zero probability that the functions may return wrong result if the default JVM is changed during their execution.
3. This open opportunity for parallelisation of JSON files generation `gregorian-julian-rebase-micros.json` and `julian-gregorian-rebase-micros.json`. Currently, the tests `generate 'gregorian-julian-rebase-micros.json'` and `generate 'julian-gregorian-rebase-micros.json'` generate the JSON files by iterating over all time zones sequentially w/ step of 1 week. Due to the large step, we can miss some spikes in diffs between 2 calendars (Java 8 Gregorian and Java 7 hybrid calendars) as the PR #28787 fixed and #28816 should fix.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running existing tests from `RebaseDateTimeSuite`.
Closes #28824 from MaxGekk/pure-micros-rebasing.
Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? 1. Change the max step from 1 week to 30 minutes in the tests `RebaseDateTimeSuite`.`generate 'gregorian-julian-rebase-micros.json'` and `generate 'julian-gregorian-rebase-micros.json'`. 2. Parallelise JSON files generation in the function `generateRebaseJson` by using `ThreadUtils.parmap`. ### Why are the changes needed? 1. To prevent the bugs that are fixed by #28787 and #28816. 2. The parallelisation speeds up JSON file generation. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? By generating the JSON file `julian-gregorian-rebase-micros.json`. Closes #28827 from MaxGekk/rebase-30-min. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of
rebaseGregorianToJulianMicroswhich performs rebasing via local timestamps. In the case of overlapping:Note: The fix assumes that transitions cannot happen more often than once per 2 days.
Why are the changes needed?
Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of
Asia/Hong_Kong, the time zone switched fromJapan Standard Time(UTC+9) toHong Kong Time(UTC+8) on Sunday, 18 November, 1945 01:59:59 AM. The changes allow to handle the special case as well.Does this PR introduce any user-facing change?
It might affect micros rebasing in before common era when not-optimised version of
rebaseGregorianToJulianMicros()is used directly.How was this patch tested?
DateTimeUtilsSuite,RebaseDateTimeSuite,DateFunctionsSuite,DateExpressionsSuiteandTimestampFormatterSuite.RebaseDateTimeSuitegregorian-julian-rebase-micros.jsonwith the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.