-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30760][SQL] Port millisToDays and daysToMillis on Java 8 time API
#27494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val maxMonthInterval = 10000 * 12 | ||
| checkEvaluation( | ||
| AddMonths(Literal(Date.valueOf("0001-01-01")), Literal(maxMonthInterval)), 2933261) | ||
| AddMonths(Literal(LocalDate.parse("0001-01-01")), Literal(maxMonthInterval)), 2933263) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to replace Date.valueOf by LocalDate.parse due to migration on another calendar from Julian to Gregorian. The expected number of days is easy to get:
val a = LocalDate.of(10001, 1, 1).toEpochDay
println(a)
2933263There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we place the hardcoded 2933263 with LocalDate.of(10001, 1, 1).toEpochDay?
|
Test build #118046 has finished for PR 27494 at commit
|
|
Test build #118045 has finished for PR 27494 at commit
|
|
Test build #118061 has finished for PR 27494 at commit
|
|
Test build #118072 has finished for PR 27494 at commit
|
| def checkFromToJavaDate(d1: Date): Unit = { | ||
| val d2 = toJavaDate(fromJavaDate(d1)) | ||
| assert(d2.toString === d1.toString) | ||
| assert(format(d2) === format(d1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to use the formatter to avoid calendar problems - Date.toString formats slightly differently than DateTimeFormatter, I guess because of different year lengths in calendars.
millisToDays and daysToMillis on Java 8 time APImillisToDays and daysToMillis on Java 8 time API
|
jenkins, retest this, please |
|
Test build #118076 has finished for PR 27494 at commit
|
| struct<date_trunc(DECADE, CAST(to_date('0002-12-31 BC', 'yyyy-MM-dd G') AS TIMESTAMP)):timestamp> | ||
| -- !query output | ||
| -0010-01-01 00:07:02 | ||
| -0010-01-01 00:00:00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask if there are any behavior changes. This is one, but it looks like it was wrong before?
Are there any others that we know of? Given recent discussions I'm extra cautious about introducing behavior changes I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had been already described in the SQL migration guide. The numbers are different because internally we use Proleptic Gregorian calendar in date_trunc but when we collect data to the driver, we convert internal timestamps to java.sql.Timestamp. Actually, we switch to Julian calendar for old dates here.
By settings localSparkSession.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true) in SQLQueryTestSuite.scala, I just keep the same calendar in all transformations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is caused by https://github.com/apache/spark/pull/27494/files#diff-432455394ca50800d5de508861984ca5R340 , not the fix itself, right?
Do we have any known behavior changes caused by this fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the change in this particular line is caused by localSparkSession.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true). But I had to set the settings to fix another test failures like in #27494 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's helpful to do #27494 (comment) first and then we can know what gets changed exactly by this fix?
| localSparkSession.conf.set(SQLConf.ANSI_ENABLED.key, true) | ||
| case _ => | ||
| } | ||
| localSparkSession.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug in HiveResult, not just this test suite.
When we produce string result, we must stick with the Proleptic Gregorian calendar and call LocalDate/Instant.toString. I think we should update HiveResult.hiveResultString and enable java 8 datetime API before calling collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveResult is used in hive-thriftserver too. If I set spark.sql.datetime.java8API.enabled to true in HiveResult.hiveResultString, this might effect on the Thrift server. Though I can correct expected results in the tests, but Thrift server may require broader changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we open a new PR to do it? I believe we need to do this after switching to Proleptic Gregorian calendar. We should avoid using different calendars inconsistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will do that after this PR will be merged because I will need the changes, most likely.
|
Test build #118152 has finished for PR 27494 at commit
|
|
jenkins, retest this, please |
|
Test build #118166 has finished for PR 27494 at commit
|
|
are there any known behavior changes (bug fixes) caused by this patch? |
NOTE: This affects only old dates/timestamps before 1582 year. |
|
OK let's get this in 3.0 to stick with Proleptic Gregorian calendar. Can we fix the conflicts? |
…ys-java8-api # Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
|
Test build #118300 has finished for PR 27494 at commit
|
|
thanks, merging to master/3.0! @MaxGekk let's fix #27494 (comment) in a followup. |
…ime API ### What changes were proposed in this pull request? In the PR, I propose to rewrite the `millisToDays` and `daysToMillis` of `DateTimeUtils` using Java 8 time API. I removed `getOffsetFromLocalMillis` from `DateTimeUtils` because it is a private methods, and is not used anymore in Spark SQL. ### Why are the changes needed? New implementation is based on Proleptic Gregorian calendar which has been already used by other date-time functions. This changes make `millisToDays` and `daysToMillis` consistent to rest Spark SQL API related to date & time operations. ### Does this PR introduce any user-facing change? Yes, this might effect behavior for old dates before 1582 year. ### How was this patch tested? By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, DateExpressionsSuite`, `SQLQuerySuite` and `HiveResultSuite`. Closes #27494 from MaxGekk/millis-2-days-java8-api. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit aa0d136) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ime API ### What changes were proposed in this pull request? In the PR, I propose to rewrite the `millisToDays` and `daysToMillis` of `DateTimeUtils` using Java 8 time API. I removed `getOffsetFromLocalMillis` from `DateTimeUtils` because it is a private methods, and is not used anymore in Spark SQL. ### Why are the changes needed? New implementation is based on Proleptic Gregorian calendar which has been already used by other date-time functions. This changes make `millisToDays` and `daysToMillis` consistent to rest Spark SQL API related to date & time operations. ### Does this PR introduce any user-facing change? Yes, this might effect behavior for old dates before 1582 year. ### How was this patch tested? By existing test suites `DateTimeUtilsSuite`, `DateFunctionsSuite`, DateExpressionsSuite`, `SQLQuerySuite` and `HiveResultSuite`. Closes apache#27494 from MaxGekk/millis-2-days-java8-api. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? 1. Move TimeZoneUTC and TimeZoneGMT to DateTimeTestUtils 2. Remove TimeZoneGMT 3. Use ZoneId.systemDefault() instead of defaultTimeZone().toZoneId 4. Alias SQLDate & SQLTimestamp to internal types of DateType and TimestampType 5. Avoid one `*` `DateTimeUtils`.`in fromJulianDay()` 6. Use toTotalMonths in `DateTimeUtils`.`subtractDates()` 7. Remove `julianCommonEraStart`, `timestampToString()`, `microsToEpochDays()`, `epochDaysToMicros()`, `instantToDays()` from `DateTimeUtils`. 8. Make splitDate() private. 9. Remove `def daysToMicros(days: Int): Long` and `def microsToDays(micros: Long): Int`. ### Why are the changes needed? This simplifies the common code related to date-time operations, and should improve maintainability. In particular: 1. TimeZoneUTC and TimeZoneGMT are moved to DateTimeTestUtils because they are used only in tests 2. TimeZoneGMT can be removed because it is equal to TimeZoneUTC 3. After the PR #27494, Spark expressions and DateTimeUtils functions switched to ZoneId instead of TimeZone completely. `defaultTimeZone()` with `TimeZone` as return type is not needed anymore. 4. SQLDate and SQLTimestamp types can be explicitly aliased to internal types of DateType and and TimestampType instead of declaring this in a comment. 5. Avoid one `*` `DateTimeUtils`.`in fromJulianDay()`. 6. Use toTotalMonths in `DateTimeUtils`.`subtractDates()`. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites Closes #27617 from MaxGekk/move-time-zone-consts. Lead-authored-by: Max Gekk <max.gekk@gmail.com> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to rewrite the
millisToDaysanddaysToMillisofDateTimeUtilsusing Java 8 time API.I removed
getOffsetFromLocalMillisfromDateTimeUtilsbecause it is a private methods, and is not used anymore in Spark SQL.Why are the changes needed?
New implementation is based on Proleptic Gregorian calendar which has been already used by other date-time functions. This changes make
millisToDaysanddaysToMillisconsistent to rest Spark SQL API related to date & time operations.Does this PR introduce any user-facing change?
Yes, this might effect behavior for old dates before 1582 year.
How was this patch tested?
By existing test suites
DateTimeUtilsSuite,DateFunctionsSuite, DateExpressionsSuite,SQLQuerySuiteandHiveResultSuite`.