Skip to content

Commit a056f69

Browse files
ulysses-youcloud-fan
authored andcommitted
[SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0
### What changes were proposed in this pull request? 0 is a special case for decimal which data type can be Decimal(0, 0), to be safe we should use decimal(1, 0) to represent 0. ### Why are the changes needed? fix data correctness for regression. We do not promote the decimal precision since we refactor decimal binary operater in #36698. However, it causes the intermediate decimal type of `IntegralDivide` returns decimal(0, 0). It's dangerous that Spark does not actually support decimal(0, 0). e.g. ```sql -- work with in-memory catalog create table t (c decimal(0, 0)) using parquet; -- fail with parquet -- java.lang.IllegalArgumentException: Invalid DECIMAL precision: 0 -- at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) insert into table t values(0); -- fail with hive catalog -- Caused by: java.lang.IllegalArgumentException: Decimal precision out of allowed range [1,38] -- at org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils.validateParameter(HiveDecimalUtils.java:44) create table t (c decimal(0, 0)) using parquet; ``` And decimal(0, 0) means the data is 0, so to be safe we use decimal(1, 0) to represent 0 for `IntegralDivide`. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #38760 from ulysses-you/SPARK-41219. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent e593344 commit a056f69

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -865,7 +865,9 @@ case class IntegralDivide(
865865
// This follows division rule
866866
val intDig = p1 - s1 + s2
867867
// No precision loss can happen as the result scale is 0.
868-
DecimalType.bounded(intDig, 0)
868+
// If intDig is 0 that means the result data is 0, to be safe we use decimal(1, 0)
869+
// to represent 0.
870+
DecimalType.bounded(if (intDig == 0) 1 else intDig, 0)
869871
}
870872

871873
override def sqlOperator: String = "div"

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3584,6 +3584,11 @@ class DataFrameSuite extends QueryTest
35843584
assert(row.getInt(0).toString == row.getString(3))
35853585
}
35863586
}
3587+
3588+
test("SPARK-41219: IntegralDivide use decimal(1, 0) to represent 0") {
3589+
val df = Seq("0.5944910").toDF("a")
3590+
checkAnswer(df.selectExpr("cast(a as decimal(7,7)) div 100"), Row(0))
3591+
}
35873592
}
35883593

35893594
case class GroupByKey(a: Int, b: Int)

0 commit comments

Comments
 (0)