-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-26218][SQL] Overflow on arithmetic operations returns incorrect result #21599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #92132 has finished for PR 21599 at commit
|
Test build #92133 has finished for PR 21599 at commit
|
Test build #92140 has finished for PR 21599 at commit
|
@cloud-fan @gatorsmile the main issue which is causing the UT failures, now, is that since before we were allowing overflows, in aggregations we could have an overflow eventually fixed by another overflow on the opposite direction (see the two UT failures in the Range suite). The problem here is that we can know whether an overflow occurs only checking at the incoming operands, so we cannot defer after the whole aggregation is performed to check if the overflow occurred or not. The way MySQL deals with this case is returning a Therefore, I'd like to know your thoughts about this, since on one side it is a pretty serious bug which should be fixed, on the other I cannot think of any solution which doesn't involve behavior changes. |
is this a regression in Spark 2.3? which commit caused it? |
@cloud-fan it is not a regression, it has been like that at least since 2.0... |
Test build #92172 has finished for PR 21599 at commit
|
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { | ||
case _: DecimalType => | ||
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$decimalMethod($eval2)") | ||
case CalendarIntervalType => | ||
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$calendarIntervalMethod($eval2)") | ||
// In the following cases, overflow can happen, so we need to check the result is valid. | ||
// Otherwise we throw an ArithmeticException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current Spark we are very conservative about runtime error, as it may break the data pipeline middle away, and returning null is a commonly used strategy. Shall we follow it here? We can throw exception when we have a strict mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am quite against returning null. It is not something a user expects, so he/she is likely not to check for it (when I see a NULL myself, I think that one of the 2 operands was NULL, not that an overflow occurred), so he/she won't realize the issue and would find corrupted data. Moreover, this is not how RDBMS behaves and it is against SQL standard. So I think that the behavior which was chosen for DECIMAL was wrong and I'd prefer not to introduce the same behavior also in other places.
Anyway I see your point about consistency over the codebase and it makes sense.
I'd love to know @gatorsmile and @hvanhovell's opinions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile @hvanhovell do you have time to check this and give your opinion here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we fix it to return null
and throw an exception when the configuration is on? Overflowed value is already quite pointless and changing the behaviour to return null
by default might not be so harmful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment @HyukjinKwon . The issue with returning null is described in #21599 (comment) (moreover this behavior would be against SQL standard, but that's a minor point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I am working on #25300, +1 for returning null on overflow by default with @cloud-fan @HyukjinKwon . This makes arithmetic operations and casting consistent.
I think null is better than a non-sense number on overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree on this proposal for the reasons explained in the comment I mentioned earlier. Making all arithmetic operations nullable is a too broad change and I think it is not worth for the little benefit.
Test build #92208 has finished for PR 21599 at commit
|
Test build #92216 has finished for PR 21599 at commit
|
kindly ping @gatorsmile @hvanhovell |
@cloud-fan @gatorsmile @hvanhovell what do you think about deciding a strategy about how to handle this case and try to make it happen in 2.4 as this is a correctness bug? Thanks. |
Test build #93107 has finished for PR 21599 at commit
|
kindly ping @cloud-fan @gatorsmile @hvanhovell |
I prefer returning null, and introduce a strict mode in Spark 3.0. We can revisit all the returning null cases and think if we should fail if strict mode is on. |
@cloud-fan ok, but then all arithmetic operations should be always If there are no objections, I am going to update the PR according to your suggestion. Thanks. |
ah that's a good point. Then maybe we should keep this "java style behavior" and add "strict mode" later. |
I don't think we can change behavior like this without introducing a config. In Spark 3.0 we will add a "strict mode" and closely follow SQL standard. But for Spark 2.4, how about we keep the "java style behavior" as it is? We can add some document about it though. |
@cloud-fan I think this is just a wrong behavior (being against SQL standard is not the only issue), so I am not sure that makes sense to have it controlled by a config. Anyway, I agree that as it is a behavior change, we can target it for 3.0 and add some documentation for it now, in order to let users know about this issue. |
* This method test against binary expressions by feeding them arbitrary literals of `dataType1` | ||
* and `dataType2`. | ||
*/ | ||
def checkConsistencyBetweenInterpretedAndCodegenAllowingException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me that checkConsistencyBetweenInterpretedAndCodegen
should handle exceptions as well, as that's part of the consistency check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is you suggestion here. Are you suggesting that we should collapse these 2 methods?
checkValues(Decimal(Double.MaxValue), Double.MaxValue, 0L) | ||
checkValues(Decimal(Double.MinValue), Double.MinValue, 0L) | ||
assert(Decimal(Double.MaxValue).toDouble == Double.MaxValue) | ||
assert(Decimal(Double.MinValue).toDouble == Double.MinValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because toLong
overflows in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have reverted the changes in Decimal.toLong
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can revert these changes and the tests pass, but here we are basically checking that we are returning wrong values, so I don't think it makes any sense. Anyway I can revert the change if you think we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we know Decimal.toLong
can return wrong result, and we are going to fix it by making cast follow SQL standard. Let's revert it and fix it in the cast PR.
-- We cannot test this when checkOverflow=true here | ||
-- because exception happens in the executors and the | ||
-- output stacktrace cannot have an exact match | ||
set spark.sql.arithmetic.checkOverflow=false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary? It's false by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it for enforcing this so that eventual changes on the default value won't affect here.
LGTM except a few comments, thanks for working on it! |
thank you @cloud-fan for you help on this! |
Test build #108409 has finished for PR 21599 at commit
|
Test build #108418 has finished for PR 21599 at commit
|
retest this please |
Test build #108443 has finished for PR 21599 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgaido91 Thanks for the work. The PR overall LGTM.
I am aware that it has been a lot of work. But could you add corner test cases for the overflow of byte/short/int types?
Test build #108453 has finished for PR 21599 at commit
|
thanks @gengliangwang , I added them. |
val e3 = Multiply(maxLongLiteral, Literal(2L)) | ||
val e4 = Add(minLongLiteral, minLongLiteral) | ||
val e5 = Subtract(minLongLiteral, maxLongLiteral) | ||
val e6 = Multiply(minLongLiteral, minLongLiteral) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also test UnaryMinus
? BTW do you know why the previous mistake in UnaryMinus
code was not caught by the existing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was my fault, I was not testing "normal" cases with the flag turned on. I fixed it. UnaryMinus
cases are already tested in its UT, I think it is useless to add them here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks!
Test build #108467 has finished for PR 21599 at commit
|
thanks, merging to master! |
thank you all for your great help on this! |
…here is a corresponding function in Math ### What changes were proposed in this pull request? 1. After #21599, if the option "spark.sql.failOnIntegralTypeOverflow" is enabled, all the Binary Arithmetic operator will used the exact version function. However, only `Add`/`Substract`/`Multiply` has a corresponding exact function in java.lang.Math . When the option "spark.sql.failOnIntegralTypeOverflow" is enabled, a runtime exception "BinaryArithmetics must override either exactMathMethod or genCode" is thrown if the other Binary Arithmetic operators are used, such as "Divide", "Remainder". The exact math method should be called only when there is a corresponding function in `java.lang.Math` 2. Revise the log output of casting to `Int`/`Short` 3. Enable `spark.sql.failOnIntegralTypeOverflow` for pgSQL tests in `SQLQueryTestSuite`. ### Why are the changes needed? 1. Fix the bugs of #21599 2. The test case of pgSQL intends to check the overflow of integer/long type. We should enable `spark.sql.failOnIntegralTypeOverflow`. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes #25804 from gengliangwang/enableIntegerOverflowInSQLTest. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
When an overflow occurs performing an arithmetic operation, we are returning an incorrect value. Instead, we should throw an exception, as stated in the SQL standard.
How was this patch tested?
added UT + existing UTs (improved)