[SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0

ulysses-you · cloud-fan · commit a056f694386c · 2023-01-31T11:58:14.000+08:00
### What changes were proposed in this pull request? 0 is a special case for decimal which data type can be Decimal(0, 0), to be safe we should use decimal(1, 0) to represent 0. ### Why are the changes needed? fix data correctness for regression. We do not promote the decimal precision since we refactor decimal binary operater in #36698. However, it causes the intermediate decimal type of `IntegralDivide` returns decimal(0, 0). It's dangerous that Spark does not actually support decimal(0, 0). e.g. ```sql -- work with in-memory catalog create table t (c decimal(0, 0)) using parquet; -- fail with parquet -- java.lang.IllegalArgumentException: Invalid DECIMAL precision: 0 -- at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) insert into table t values(0); -- fail with hive catalog -- Caused by: java.lang.IllegalArgumentException: Decimal precision out of allowed range [1,38] -- at org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils.validateParameter(HiveDecimalUtils.java:44) create table t (c decimal(0, 0)) using parquet; ``` And decimal(0, 0) means the data is 0, so to be safe we use decimal(1, 0) to represent 0 for `IntegralDivide`. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? add test Closes #38760 from ulysses-you/SPARK-41219. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -865,7 +865,9 @@ case class IntegralDivide(
     // This follows division rule
     val intDig = p1 - s1 + s2
     // No precision loss can happen as the result scale is 0.
-    DecimalType.bounded(intDig, 0)
+    // If intDig is 0 that means the result data is 0, to be safe we use decimal(1, 0)
+    // to represent 0.
+    DecimalType.bounded(if (intDig == 0) 1 else intDig, 0)
   }
 
   override def sqlOperator: String = "div"
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -3584,6 +3584,11 @@ class DataFrameSuite extends QueryTest
       assert(row.getInt(0).toString == row.getString(3))
     }
   }
+
+  test("SPARK-41219: IntegralDivide use decimal(1, 0) to represent 0") {
+    val df = Seq("0.5944910").toDF("a")
+    checkAnswer(df.selectExpr("cast(a as decimal(7,7)) div 100"), Row(0))
+  }
 }
 
 case class GroupByKey(a: Int, b: Int)

Original file line number	Diff line number	Diff line change
`@@ -3584,6 +3584,11 @@ class DataFrameSuite extends QueryTest`
`3584`	`3584`	`assert(row.getInt(0).toString == row.getString(3))`
`3585`	`3585`	`}`
`3586`	`3586`	`}`
	`3587`	`+`
	`3588`	`+ test("SPARK-41219: IntegralDivide use decimal(1, 0) to represent 0") {`
	`3589`	`+ val df = Seq("0.5944910").toDF("a")`
	`3590`	`+ checkAnswer(df.selectExpr("cast(a as decimal(7,7)) div 100"), Row(0))`
	`3591`	`+ }`
`3587`	`3592`	`}`
`3588`	`3593`
`3589`	`3594`	`case class GroupByKey(a: Int, b: Int)`