Skip to content

[SPARK-28395][SQL] Division operator support integral division #25158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ object TypeCoercion {
CaseWhenCoercion ::
IfCoercion ::
StackCoercion ::
Division ::
Division(conf) ::
ImplicitTypeCasts ::
DateTimeOperations ::
WindowFrameCoercion ::
Expand Down Expand Up @@ -666,7 +666,7 @@ object TypeCoercion {
* Hive only performs integral division with the DIV operator. The arguments to / are always
* converted to fractional types.
*/
object Division extends TypeCoercionRule {
case class Division(conf: SQLConf) extends TypeCoercionRule {
override protected def coerceTypes(
plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
// Skip nodes who has not been resolved yet,
Expand All @@ -677,7 +677,12 @@ object TypeCoercion {
case d: Divide if d.dataType == DoubleType => d
case d: Divide if d.dataType.isInstanceOf[DecimalType] => d
case Divide(left, right) if isNumericOrNull(left) && isNumericOrNull(right) =>
Divide(Cast(left, DoubleType), Cast(right, DoubleType))
(left.dataType, right.dataType) match {
case (_: IntegralType, _: IntegralType) if conf.preferIntegralDivision =>
IntegralDivide(left, right)
case _ =>
Divide(Cast(left, DoubleType), Cast(right, DoubleType))
}
}

private def isNumericOrNull(ex: Expression): Boolean = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1524,6 +1524,12 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val PREFER_INTEGRAL_DIVISION = buildConf("spark.sql.function.preferIntegralDivision")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, I think we should not add a new behavior that is not SQL standard. If we only need it in tests, shall we make it clear in the config name? and make it an internal config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SQL standard does not explain this case(integral / integral ), different databases have different implementations.
image

Copy link
Contributor

@cloud-fan cloud-fan Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I'd like to treat it as an internal config that is only used in the ported pgsql test cases. @wangyum can you send a follow-up PR? thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

.doc("When true, will perform integral division with the / operator " +
"if both sides are integral types.")
.booleanConf
.createWithDefault(false)

val ALLOW_CREATING_MANAGED_TABLE_USING_NONEMPTY_LOCATION =
buildConf("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation")
.internal()
Expand Down Expand Up @@ -2294,6 +2300,8 @@ class SQLConf extends Serializable with Logging {

def eltOutputAsString: Boolean = getConf(ELT_OUTPUT_AS_STRING)

def preferIntegralDivision: Boolean = getConf(PREFER_INTEGRAL_DIVISION)

def allowCreatingManagedTableUsingNonemptyLocation: Boolean =
getConf(ALLOW_CREATING_MANAGED_TABLE_USING_NONEMPTY_LOCATION)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1456,7 +1456,7 @@ class TypeCoercionSuite extends AnalysisTest {

test("SPARK-15776 Divide expression's dataType should be casted to Double or Decimal " +
"in aggregation function like sum") {
val rules = Seq(FunctionArgumentConversion, Division)
val rules = Seq(FunctionArgumentConversion, Division(conf))
// Casts Integer to Double
ruleTest(rules, sum(Divide(4, 3)), sum(Divide(Cast(4, DoubleType), Cast(3, DoubleType))))
// Left expression is Double, right expression is Int. Another rule ImplicitTypeCasts will
Expand All @@ -1475,12 +1475,35 @@ class TypeCoercionSuite extends AnalysisTest {
}

test("SPARK-17117 null type coercion in divide") {
val rules = Seq(FunctionArgumentConversion, Division, ImplicitTypeCasts)
val rules = Seq(FunctionArgumentConversion, Division(conf), ImplicitTypeCasts)
val nullLit = Literal.create(null, NullType)
ruleTest(rules, Divide(1L, nullLit), Divide(Cast(1L, DoubleType), Cast(nullLit, DoubleType)))
ruleTest(rules, Divide(nullLit, 1L), Divide(Cast(nullLit, DoubleType), Cast(1L, DoubleType)))
}

test("SPARK-28395 Division operator support integral division") {
val rules = Seq(FunctionArgumentConversion, Division(conf))
Seq(true, false).foreach { preferIntegralDivision =>
withSQLConf(SQLConf.PREFER_INTEGRAL_DIVISION.key -> s"$preferIntegralDivision") {
val result1 = if (preferIntegralDivision) {
IntegralDivide(1L, 1L)
} else {
Divide(Cast(1L, DoubleType), Cast(1L, DoubleType))
}
ruleTest(rules, Divide(1L, 1L), result1)
val result2 = if (preferIntegralDivision) {
IntegralDivide(1, Cast(1, ShortType))
} else {
Divide(Cast(1, DoubleType), Cast(Cast(1, ShortType), DoubleType))
}
ruleTest(rules, Divide(1, Cast(1, ShortType)), result2)

ruleTest(rules, Divide(1L, 1D), Divide(Cast(1L, DoubleType), Cast(1D, DoubleType)))
ruleTest(rules, Divide(Decimal(1.1), 1L), Divide(Decimal(1.1), 1L))
}
}
}

test("binary comparison with string promotion") {
val rule = TypeCoercion.PromoteStrings(conf)
ruleTest(rule,
Expand Down