Skip to content

Commit 1d1ed3e

Browse files
committed
[SPARK-35107][SQL] Parse unit-to-unit interval literals to ANSI intervals
### What changes were proposed in this pull request? Parse the year-month interval literals like `INTERVAL '1-1' YEAR TO MONTH` to values of `YearMonthIntervalType`, and day-time interval literals to `DayTimeIntervalType` values. Currently, Spark SQL supports: - DAY TO HOUR - DAY TO MINUTE - DAY TO SECOND - HOUR TO MINUTE - HOUR TO SECOND - MINUTE TO SECOND All such interval literals are converted to `DayTimeIntervalType`, and `YEAR TO MONTH` to `YearMonthIntervalType` while loosing info about `from` and `to` units. **Note**: new behavior is under the SQL config `spark.sql.legacy.interval.enabled` which is `false` by default. When the config is set to `true`, the interval literals are parsed to `CaledarIntervalType` values. Closes #32176 ### Why are the changes needed? To conform the ANSI SQL standard which assumes conversions of interval literals to year-month or day-time interval but not to mixed interval type like Catalyst's `CalendarIntervalType`. ### Does this PR introduce _any_ user-facing change? Yes. Before: ```sql spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND; 1 days 1 hours 2 minutes 3.123 seconds spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND); interval ``` After: ```sql spark-sql> SELECT INTERVAL '1 01:02:03.123' DAY TO SECOND; 1 01:02:03.123000000 spark-sql> SELECT typeof(INTERVAL '1 01:02:03.123' DAY TO SECOND); day-time interval ``` ### How was this patch tested? 1. By running the affected test suites: ``` $ ./build/sbt "test:testOnly *.ExpressionParserSuite" $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql" $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z create_view.sql" $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z date.sql" $ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *SQLQueryTestSuite -- -z timestamp.sql" ``` 2. PostgresSQL tests are executed with `spark.sql.legacy.interval.enabled` is set to `true` to keep compatibility with PostgreSQL output: ```sql > SELECT interval '999' second; 0 years 0 mons 0 days 0 hours 16 mins 39.00 secs ``` Closes #32209 from MaxGekk/parse-ansi-interval-literals. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent 8dc455b commit 1d1ed3e

File tree

11 files changed

+254
-194
lines changed

11 files changed

+254
-194
lines changed

docs/sql-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ license: |
8181

8282
- In Spark 3.2, `TRANSFORM` operator can support `ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use `StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING` and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as `STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType` output columns.
8383

84+
- In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR TO MONTH` are converted to ANSI interval types: `YearMonthIntervalType` or `DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.
85+
8486
## Upgrading from Spark SQL 3.0 to 3.1
8587

8688
- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
package org.apache.spark.sql.catalyst.parser
1919

2020
import java.util.Locale
21+
import java.util.concurrent.TimeUnit
2122
import javax.xml.bind.DatatypeConverter
2223

2324
import scala.collection.JavaConverters._
@@ -2306,12 +2307,30 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
23062307
}
23072308

23082309
/**
2309-
* Create a [[CalendarInterval]] literal expression. Two syntaxes are supported:
2310+
* Create a [[CalendarInterval]] or ANSI interval literal expression.
2311+
* Two syntaxes are supported:
23102312
* - multiple unit value pairs, for instance: interval 2 months 2 days.
23112313
* - from-to unit, for instance: interval '1-2' year to month.
23122314
*/
23132315
override def visitInterval(ctx: IntervalContext): Literal = withOrigin(ctx) {
2314-
Literal(parseIntervalLiteral(ctx), CalendarIntervalType)
2316+
val calendarInterval = parseIntervalLiteral(ctx)
2317+
if (ctx.errorCapturingUnitToUnitInterval != null && !conf.legacyIntervalEnabled) {
2318+
// Check the `to` unit to distinguish year-month and day-time intervals because
2319+
// `CalendarInterval` doesn't have enough info. For instance, new CalendarInterval(0, 0, 0)
2320+
// can be derived from INTERVAL '0-0' YEAR TO MONTH as well as from
2321+
// INTERVAL '0 00:00:00' DAY TO SECOND.
2322+
val toUnit = ctx.errorCapturingUnitToUnitInterval.body.to.getText.toLowerCase(Locale.ROOT)
2323+
if (toUnit == "month") {
2324+
assert(calendarInterval.days == 0 && calendarInterval.microseconds == 0)
2325+
Literal(calendarInterval.months, YearMonthIntervalType)
2326+
} else {
2327+
assert(calendarInterval.months == 0)
2328+
val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS)
2329+
Literal(micros, DayTimeIntervalType)
2330+
}
2331+
} else {
2332+
Literal(calendarInterval, CalendarIntervalType)
2333+
}
23152334
}
23162335

23172336
/**

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala

Lines changed: 31 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -714,37 +714,39 @@ class ExpressionParserSuite extends AnalysisTest {
714714
// Non Existing unit
715715
intercept("interval 10 nanoseconds", "invalid unit 'nanoseconds'")
716716

717-
// Year-Month intervals.
718-
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
719-
yearMonthValues.foreach { value =>
720-
val result = Literal(IntervalUtils.fromYearMonthString(value))
721-
checkIntervals(s"'$value' year to month", result)
722-
}
717+
withSQLConf(SQLConf.LEGACY_INTERVAL_ENABLED.key -> "true") {
718+
// Year-Month intervals.
719+
val yearMonthValues = Seq("123-10", "496-0", "-2-3", "-123-0", "\t -1-2\t")
720+
yearMonthValues.foreach { value =>
721+
val result = Literal(IntervalUtils.fromYearMonthString(value))
722+
checkIntervals(s"'$value' year to month", result)
723+
}
723724

724-
// Day-Time intervals.
725-
val datTimeValues = Seq(
726-
"99 11:22:33.123456789",
727-
"-99 11:22:33.123456789",
728-
"10 9:8:7.123456789",
729-
"1 0:0:0",
730-
"-1 0:0:0",
731-
"1 0:0:1",
732-
"\t 1 0:0:1 ")
733-
datTimeValues.foreach { value =>
734-
val result = Literal(IntervalUtils.fromDayTimeString(value))
735-
checkIntervals(s"'$value' day to second", result)
736-
}
725+
// Day-Time intervals.
726+
val datTimeValues = Seq(
727+
"99 11:22:33.123456789",
728+
"-99 11:22:33.123456789",
729+
"10 9:8:7.123456789",
730+
"1 0:0:0",
731+
"-1 0:0:0",
732+
"1 0:0:1",
733+
"\t 1 0:0:1 ")
734+
datTimeValues.foreach { value =>
735+
val result = Literal(IntervalUtils.fromDayTimeString(value))
736+
checkIntervals(s"'$value' day to second", result)
737+
}
737738

738-
// Hour-Time intervals.
739-
val hourTimeValues = Seq(
740-
"11:22:33.123456789",
741-
"9:8:7.123456789",
742-
"-19:18:17.123456789",
743-
"0:0:0",
744-
"0:0:1")
745-
hourTimeValues.foreach { value =>
746-
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
747-
checkIntervals(s"'$value' hour to second", result)
739+
// Hour-Time intervals.
740+
val hourTimeValues = Seq(
741+
"11:22:33.123456789",
742+
"9:8:7.123456789",
743+
"-19:18:17.123456789",
744+
"0:0:0",
745+
"0:0:1")
746+
hourTimeValues.foreach { value =>
747+
val result = Literal(IntervalUtils.fromDayTimeString(value, HOUR, SECOND))
748+
checkIntervals(s"'$value' hour to second", result)
749+
}
748750
}
749751

750752
// Unknown FROM TO intervals

sql/core/src/test/resources/sql-tests/inputs/interval.sql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ select cast('- +1 second' as interval);
5151
select interval 13.123456789 seconds, interval -13.123456789 second;
5252
select interval 1 year 2 month 3 week 4 day 5 hour 6 minute 7 seconds 8 millisecond 9 microsecond;
5353
select interval '30' year '25' month '-100' day '40' hour '80' minute '299.889987299' second;
54+
select interval '0-0' year to month;
55+
select interval '0 0:0:0' day to second;
5456
select interval '0 0:0:0.1' day to second;
5557
select interval '10-9' year to month;
5658
select interval '20 15' day to hour;

0 commit comments

Comments
 (0)