[SPARK-32018][SQL][FollowUp][3.0] Throw exception on decimal value overflow of sum aggregation #29404

gengliangwang · 2020-08-11T06:39:57Z

What changes were proposed in this pull request?

This is a followup of #29125
In branch 3.0:

for hash aggregation, before [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value #29125 there will be a runtime exception on decimal overflow of sum aggregation; after [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value #29125, there could be a wrong result.
for sort aggregation, with/without [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value #29125, there could be a wrong result on decimal overflow.

While in master branch(the future 3.1 release), the problem doesn't exist since in #27627 there is a flag for marking whether overflow happens in aggregation buffer. However, the aggregation buffer is written in steaming checkpoints. Thus, we can't change to aggregation buffer to resolve the issue.

As there is no easy solution for returning null/throwing exception regarding spark.sql.ansi.enabled on overflow in branch 3.0, we have to make a choice here: always throw exception on decimal value overflow of sum aggregation.

Why are the changes needed?

Avoid returning wrong result in decimal value sum aggregation.

Does this PR introduce any user-facing change?

Yes, there is always exception on decimal value overflow of sum aggregation, instead of a possible wrong result.

How was this patch tested?

Unit test case

gengliangwang · 2020-08-11T06:40:44Z

cc @cloud-fan @skambha @dongjoon-hyun @ScrapCodes

SparkQA · 2020-08-11T07:05:01Z

Test build #127309 has finished for PR 29404 at commit 0a23279.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-11T07:05:02Z

Test build #127311 has finished for PR 29404 at commit f21f1a0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-11T10:36:36Z

Test build #127316 has finished for PR 29404 at commit a8be9e1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-11T11:44:58Z

Test build #127313 has finished for PR 29404 at commit b9af4f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-08-12T05:38:13Z

This adds perf overhead as we need to check overflow after each Add operation, while the master branch only checks overflow at the end because we have an extra agg buffer slot.

I think this perf overhead is necessary to avoid this correctness bug in 3.0/2.4, but I'm open to other opinions. cc @skambha @dongjoon-hyun @viirya @maropu

viirya · 2020-08-12T05:57:51Z

However, the aggregation buffer is written in steaming checkpoints. Thus, we can't change to aggregation buffer to resolve the issue.

Is this saying the isEmpty in Sum cannot be backported to branch-3.0?

viirya · 2020-08-12T06:21:27Z

And looks like we don't have many choices. I think correctness should be considered first.

cloud-fan · 2020-08-12T06:44:49Z

Is this saying the isEmpty in Sum cannot be backported to branch-3.0?

I think so. 3.0 doesn't even have mechanisms to detect incompatible state store format, and it may be too much work to backport the bug fix and the streaming state store checks.

maropu · 2020-08-12T09:06:07Z

And looks like we don't have many choices. I think correctness should be considered first.

I have the same opnion with @viirya, and I think the overhead is inevitable.

dongjoon-hyun · 2020-08-12T13:56:25Z

Thank you for pinging me, @gengliangwang and @cloud-fan .

dongjoon-hyun

+1, LGTM.

cloud-fan · 2020-08-13T03:52:11Z

thanks, merging to 3.0!

…erflow of sum aggregation ### What changes were proposed in this pull request? This is a followup of #29125 In branch 3.0: 1. for hash aggregation, before #29125 there will be a runtime exception on decimal overflow of sum aggregation; after #29125, there could be a wrong result. 2. for sort aggregation, with/without #29125, there could be a wrong result on decimal overflow. While in master branch(the future 3.1 release), the problem doesn't exist since in #27627 there is a flag for marking whether overflow happens in aggregation buffer. However, the aggregation buffer is written in steaming checkpoints. Thus, we can't change to aggregation buffer to resolve the issue. As there is no easy solution for returning null/throwing exception regarding `spark.sql.ansi.enabled` on overflow in branch 3.0, we have to make a choice here: always throw exception on decimal value overflow of sum aggregation. ### Why are the changes needed? Avoid returning wrong result in decimal value sum aggregation. ### Does this PR introduce _any_ user-facing change? Yes, there is always exception on decimal value overflow of sum aggregation, instead of a possible wrong result. ### How was this patch tested? Unit test case Closes #29404 from gengliangwang/fixSum. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dongjoon-hyun · 2020-08-13T04:06:05Z

Thank you all!

maropu · 2020-08-16T15:29:08Z

It seems this commit caused the valid test failure in DataFarmeSuite;

[info] - SPARK-28224: Aggregate sum big decimal overflow *** FAILED *** (384 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 1 times, most recent failure: Lost task 0.0 in stage 77.0 (TID 197, 192.168.11.10, executor driver): java.lang.ArithmeticException: Decimal(expanded,111111111111111111110.246000000000000000,39,18}) cannot be represented as Decimal(38, 18).
[info] 	at org.apache.spark.sql.types.Decimal.toPrecision(Decimal.scala:369)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregate_sum_0$(Unknown Source)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown Source)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown Source)
[info] 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
[info] 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/533/

I've checked that the test failed w/this commit and passed w/o it in my local env.
@gengliangwang Could you check the failure?

gengliangwang · 2020-08-17T04:35:29Z

@maropu Thanks for reporting. I have created #29448 to fix the test failure.

cloud-fan · 2020-08-17T06:25:54Z

ah it's branch-3.0 so the github action doesn't count. @HyukjinKwon how hard it is to have github action in branch-3.0?

### What changes were proposed in this pull request? Revert SPARK-32018 related changes in branch 3.0: #29125 and #29404 ### Why are the changes needed? #29404 is made to fix correctness regression introduced by #29125. However, the behavior of decimal overflow is strange in non-ansi mode: 1. from 3.0.0 to 3.0.1: decimal overflow will throw exceptions instead of returning null on decimal overflow 2. from 3.0.1 to 3.1.0: decimal overflow will return null instead of throwing exceptions. So, this PR proposes to revert both #29404 and #29125. So that Spark will return null on decimal overflow in Spark 3.0.0 and Spark 3.0.1. ### Does this PR introduce _any_ user-facing change? Yes, Spark will return null on decimal overflow in Spark 3.0.1. ### How was this patch tested? Unit tests Closes #29450 from gengliangwang/revertDecimalOverflow. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

HyukjinKwon · 2020-08-18T02:29:52Z

@cloud-fan, I will port it back to other branches. I think it's doable.

throw exception on decimal overflow

0a23279

probot-autolabeler bot added the SQL label Aug 11, 2020

update

f21f1a0

gengliangwang added 2 commits August 11, 2020 15:06

update

b9af4f5

fix issue at merge aggregation phase

a8be9e1

dongjoon-hyun approved these changes Aug 12, 2020

View reviewed changes

cloud-fan closed this Aug 13, 2020

gengliangwang mentioned this pull request Aug 17, 2020

[Do not merge][SPARK-32018][SQL][3.0][FOLLOWUP] Fix a test failure in DataFrameSuite #29448

Closed

gengliangwang mentioned this pull request Aug 17, 2020

[3.0][SQL] Revert SPARK-32018 #29450

Closed

[SPARK-32018][SQL][FollowUp][3.0] Throw exception on decimal value overflow of sum aggregation #29404

[SPARK-32018][SQL][FollowUp][3.0] Throw exception on decimal value overflow of sum aggregation #29404

Uh oh!

Conversation

gengliangwang commented Aug 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gengliangwang commented Aug 11, 2020

Uh oh!

SparkQA commented Aug 11, 2020

Uh oh!

SparkQA commented Aug 11, 2020

Uh oh!

SparkQA commented Aug 11, 2020

Uh oh!

SparkQA commented Aug 11, 2020

Uh oh!

cloud-fan commented Aug 12, 2020

Uh oh!

viirya commented Aug 12, 2020

Uh oh!

viirya commented Aug 12, 2020

Uh oh!

cloud-fan commented Aug 12, 2020

Uh oh!

maropu commented Aug 12, 2020

Uh oh!

dongjoon-hyun commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 13, 2020

Uh oh!

maropu commented Aug 16, 2020

Uh oh!

gengliangwang commented Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Aug 17, 2020

Uh oh!

HyukjinKwon commented Aug 18, 2020

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 12, 2020 •

edited

Loading

cloud-fan commented Aug 13, 2020 •

edited

Loading

gengliangwang commented Aug 17, 2020 •

edited

Loading