[SPARK-25454][SQL] add a new config for picking minimum precision for integral literals by cloud-fan · Pull Request #22494 · apache/spark

cloud-fan · 2018-09-20T13:19:33Z

What changes were proposed in this pull request?

#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: spark.sql.legacy.literal.pickMinimumPrecision.

This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale.

How was this patch tested?

a new test

cloud-fan · 2018-09-20T13:19:46Z

cc @mgaido91 @gatorsmile @dilipbiswal

cloud-fan · 2018-09-20T13:21:15Z

Maybe we should also consider to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS by default.

mgaido91 · 2018-09-20T13:25:24Z

I don't really agree with this. As I mentioned here, I think we can avoid to use negative scales regardless of the value of the flag, since before #20023 a negative scale here is not possible. So I think we can just add a check for avoiding a non-negative scale in DecimalType.fromLiteral.

Doing so, there is no need to turn the flag off by default.

cloud-fan · 2018-09-20T13:28:41Z

So I think we can just add a check for avoiding a non-negative scale in DecimalType.fromLiteral.

Can you open a PR for it? I did some experiments locally and this seems not work.

mgaido91 · 2018-09-20T13:29:32Z

Sure @cloud-fan, I'll do asap.

mgaido91 · 2018-09-20T14:18:44Z

oh, I see now the problem. And it is much bigger than I thought, sorry. Here we were returning a negative scale for 1e6 (it happens in the AstBuilder, as we build a BigDecimal from it) also before the change in #20023. But before we had a very high (unneeded) precision for 1000, which "fixed" the issue with the negative scale. This means that we were lucky in that specific query. If you try, for instance, select 26393499451 / 1000e6, then this fails also with your fix and before #20023. So this is not a solution either, I think.

cloud-fan · 2018-09-20T14:34:32Z

yea, that's why I change the test to use 1e6 * 1000, instead of 1000e6.

The point is, we know there is a bug and it's hard to fix. What I'm trying to do here is not fixing the bug, but to allow users to fully turn off #20023. Thus we can avoid regressions and not make it worse.

cloud-fan · 2018-09-20T14:35:50Z

2.3.2 and 2.4.0 are both in the RC stage, I don't want to spend a lot of time to fix this long-standing bug and block 2 releases. cc @jerryshao too.

mgaido91 · 2018-09-20T14:45:23Z

yes @cloud-fan, I see your point. I am neutral to this change honestly.

It is true that it avoids regressions form previous cases, but it is also true that it doesn't make the behavior better: I mean in some (weird) cases like this one it behaves better, but there are other cases when the opposite is true since we were overestimating the precision.

cloud-fan · 2018-09-20T14:56:41Z

This PR is a no-op if users don't turn off #20023 . The benefit of this PR is: before we fully fix that bug, if a user hit it, he can turn off #20023 to temporarily work around it. So I don't see any harm of this PR.

Turning off #20023 by default is another story and I'd like to get more feedback before doing it.

mgaido91 · 2018-09-20T15:17:16Z

So I don't see any harm of this PR.

If the user doesn't turn off the flag, of course nothing changes. If the user does, then let's imagine this case. A user has this: select 1234567891 / (cast(2 as bigint) * (a decimal(1, 1) number) * cast(2 as bigint)). Before this PR, this operation works fine. After the PR the right operand would have a precision which is at least 38 (while it could be much less, ie. 4 before this change). The result of the division would be a decimal(38, 38): so the operation would overflow.

It is true that after this PR, Spark will support (with the same exact precision) all the operations that worked fine before #20023. But this PR also can break some operation that started working fine after #20023 (even with the flag turned off).

cloud-fan · 2018-09-20T15:48:19Z

why do we care about breaking operations when turning off a behavior change config? The config is prepared for this case: if a user hit a problem of the new behavior, he can use this config to restore to the previous behavior. For this config, the previous behavior is the behavior before #20023.

If your argument is, picking a precise precision for literal is an individual featue and not related to #20023, I'm OK to create a new config for it.

SparkQA · 2018-09-20T16:23:53Z

Test build #96360 has finished for PR 22494 at commit 0f6ec74.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2018-09-20T23:39:04Z

@cloud-fan The change looks fine to me. I looked at the failure. It correctly switches to old behaviour when this config is set to off.

SparkQA · 2018-09-21T06:11:54Z

Test build #96392 has finished for PR 22494 at commit 1ee9f02.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2018-09-21T07:03:04Z

retest this please

mgaido91 · 2018-09-21T07:24:34Z

If your argument is, picking a precise precision for literal is an individual featue and not related to #20023, I'm OK to create a new config for it.

Yes this is - I think - a better option. Indeed, what I meant was this: let's imagine I am a Spark 2.3.0 user and I have DECIMAL_OPERATIONS_ALLOW_PREC_LOSS turned to false. Before this patch, I can successfully run select 1234567891 / (1.1 * 2 * 2 * 2 * 2). After this patch, this query would return null instead, as an overflow would happen. So this patch is "correcting" a regression from 2.2 but it is introducing another one from 2.3.0-2.3.1.

Using another config is therefore a better workaround IMO.

cloud-fan · 2018-09-21T09:19:28Z

I tried to add a new config, but decided to not do it. The problem is, when users hit SPARK-25454, they must turn off both the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config.

DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is an internal config and is true by default. The only reason to turn it off is: we hit SPARK-25454. That said, I won't treat it as a regression, as a user will not turn it off to run select 1234567891 / (1.1 * 2 * 2 * 2 * 2).

mgaido91 · 2018-09-21T09:34:06Z

The problem is, when users hit SPARK-25454, they must turn off both the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config.

If a user hits SPARK-25454, the value of DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is not relevant.

The only reason to turn it off is: we hit SPARK-25454

No if SPARK-25454, turning it off is helpless. The only reason to turn it off is to get null instead of facing a precision loss when we need a precision higher than 38 and a scale higher than 6.

cloud-fan · 2018-09-21T09:56:17Z

Sorry my mistake. I'm talking about the specific query reported at https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 , which needs to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.

SPARK-25454 is a long-standing bug and currently we can't help users to work around it.

My point is, to work around this regression, user must turn off both DECIMAL_OPERATIONS_ALLOW_PREC_LOSS and the new config, which makes me think we should not create a new config.

After this patch, this query would return null instead, as an overflow would happen. So this patch is "correcting" a regression from 2.2 but it is introducing another one from 2.3.0-2.3.1.

I don't agree with it. Users can turn on DECIMAL_OPERATIONS_ALLOW_PREC_LOSS to make the query work. We should not fix values of some configs and then define regression, that's not a regression.

The reason why this is a regression is: users have no way to get the same (and expected) result of 2.3 in 2.4.

mgaido91 · 2018-09-21T10:34:11Z

I'm talking about the specific query reported at https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16618104 , which needs to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.

Yes I am talking about that too, and there is no need to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS for that query. If you define another config and you switch only the other config, you'll see that the result is correct, even with DECIMAL_OPERATIONS_ALLOW_PREC_LOSS turned on.

SparkQA · 2018-09-21T11:21:07Z

Test build #96406 has finished for PR 22494 at commit 1ee9f02.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-21T12:56:33Z

sorry, I screwed up my local branch and made a wrong conclusion. Turning off the precise precision for integral literals can fix the regression. So it's better to add a config for it.

mgaido91 · 2018-09-21T13:14:20Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      .doc("When integral literals are used with decimals in binary operators, Spark will " +
+        "pick a precise precision for the literals to calculate the precision and scale " +
+        "of the result decimal, when this config is true. By picking a precise precision, we " +
+        "can avoid wasting precision, to reduce the possibility of overflow.")


to reduce the possibility of overflow actually this is not true and depends on the value of DECIMAL_OPERATIONS_ALLOW_PREC_LOSS, If DECIMAL_OPERATIONS_ALLOW_PREC_LOSS is true, the risk is to have a precision loss, but we don't overflow. If that is false, then this statement is right.

mgaido91 · 2018-09-21T13:16:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .booleanConf
      .createWithDefault(true)

+  val LITERAL_PRECISE_PRECISION =


mmh... PRECISE_PRECISION sounds weird... can we look for a better one? I don't have a suggestion right now, but I'll think about it.

mgaido91 · 2018-09-21T13:17:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+    buildConf("spark.sql.literal.precisePrecision")
+      .internal()
+      .doc("When integral literals are used with decimals in binary operators, Spark will " +
+        "pick a precise precision for the literals to calculate the precision and scale " +


a precise precision -> the minimal precision required to represent the given value?

mgaido91 · 2018-09-21T13:18:12Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

  }
+
+  test("SPARK-25454: decimal division with negative scale") {
+    // TODO: completely fix this issue even LITERAL_PRECISE_PRECISION is true.


nit: even when

mgaido91

LGTM apart from a minor comment, thanks @cloud-fan.

Shall we also document this somehow in the migration guide or on the JIRA? Otherwise, how can users know about it?

nit: what about referencing SPARK-25454 in the title though? This seems more a temp fix for that specific issue than a follow-up of SPARK-22036 to me.

mgaido91 · 2018-09-21T13:54:47Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .createWithDefault(true)

+  val LITERAL_PICK_MINIMUM_PRECISION =
+    buildConf("spark.sql.literal.pickMinimumPrecision")


nit: I am thinking whether we can consider this as a legacy config. We can also remove it once the PR for SPARK-25454 will be merged and/or we remove the support to negative scale decimals. What do you think?

SparkQA · 2018-09-21T16:22:41Z

Test build #96423 has finished for PR 22494 at commit cc149be.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2018-09-21T16:30:41Z

retest this please

SparkQA · 2018-09-21T17:40:33Z

Test build #96425 has finished for PR 22494 at commit ad79c56.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-21T17:54:31Z

Test build #96429 has finished for PR 22494 at commit b4fdd13.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-21T21:10:55Z

Test build #96446 has finished for PR 22494 at commit b4fdd13.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-24T13:42:28Z

@mgaido91 Since we are going to merge this PR to both 2.4 and 2.3, and the default behavior is not changed. I think we don't need to add migration guide. I'll post this workaround to the JIRA ticket when this PR is merged.

mgaido91 · 2018-09-24T13:44:46Z

yes, sounds ok to me. Thanks. LGTM.

SparkQA · 2018-09-24T14:44:00Z

Test build #96509 has started for PR 22494 at commit faa0bd0.

cloud-fan · 2018-09-25T03:07:18Z

retest this please

SparkQA · 2018-09-25T07:05:02Z

Test build #96533 has finished for PR 22494 at commit faa0bd0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2018-09-25T07:10:24Z

retest this please

SparkQA · 2018-09-25T11:17:47Z

Test build #96539 has finished for PR 22494 at commit faa0bd0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile

LGTM

Thanks! Merged to master/2.4/2.3

… integral literals ## What changes were proposed in this pull request? #20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: `spark.sql.literal.pickMinimumPrecision`. This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale. ## How was this patch tested? a new test Closes #22494 from cloud-fan/decimal. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> (cherry picked from commit d0990e3) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

… integral literals ## What changes were proposed in this pull request? apache#20023 proposed to allow precision lose during decimal operations, to reduce the possibilities of overflow. This is a behavior change and is protected by the DECIMAL_OPERATIONS_ALLOW_PREC_LOSS config. However, that PR introduced another behavior change: pick a minimum precision for integral literals, which is not protected by a config. This PR add a new config for it: `spark.sql.literal.pickMinimumPrecision`. This can allow users to work around issue in SPARK-25454, which is caused by a long-standing bug of negative scale. ## How was this patch tested? a new test Closes apache#22494 from cloud-fan/decimal. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

cloud-fan mentioned this pull request Sep 20, 2018

[SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale #22450

Closed

cloud-fan force-pushed the decimal branch from 1ee9f02 to cc149be Compare September 21, 2018 12:47

cloud-fan changed the title ~~[SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PREC_LOSS should cover all the behavior changes~~ [SPARK-22036][SQL][followup] add a new config for picking precise precision for integral literals Sep 21, 2018

add a config to turn off precise precision for integral literals

ad79c56

cloud-fan force-pushed the decimal branch from cc149be to ad79c56 Compare September 21, 2018 12:55

mgaido91 reviewed Sep 21, 2018

View reviewed changes

address comments

b4fdd13

cloud-fan changed the title ~~[SPARK-22036][SQL][followup] add a new config for picking precise precision for integral literals~~ [SPARK-22036][SQL][followup] add a new config for picking minimum precision for integral literals Sep 21, 2018

mgaido91 reviewed Sep 21, 2018

View reviewed changes

cloud-fan changed the title ~~[SPARK-22036][SQL][followup] add a new config for picking minimum precision for integral literals~~ [SPARK-25454][SQL] add a new config for picking minimum precision for integral literals Sep 21, 2018

add legacy

faa0bd0

gatorsmile reviewed Sep 27, 2018

View reviewed changes

asfgit closed this in d0990e3 Sep 27, 2018

Conversation

cloud-fan commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Sep 20, 2018

Uh oh!

mgaido91 commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Sep 20, 2018

Uh oh!

mgaido91 commented Sep 20, 2018

Uh oh!

mgaido91 commented Sep 20, 2018

Uh oh!

cloud-fan commented Sep 20, 2018

Uh oh!

cloud-fan commented Sep 20, 2018

Uh oh!

mgaido91 commented Sep 20, 2018

Uh oh!

cloud-fan commented Sep 20, 2018

Uh oh!

mgaido91 commented Sep 20, 2018

Uh oh!

cloud-fan commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 20, 2018

Uh oh!

dilipbiswal commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

dilipbiswal commented Sep 21, 2018

Uh oh!

mgaido91 commented Sep 21, 2018

Uh oh!

cloud-fan commented Sep 21, 2018

Uh oh!

mgaido91 commented Sep 21, 2018

Uh oh!

cloud-fan commented Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgaido91 commented Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

cloud-fan commented Sep 21, 2018

Uh oh!

mgaido91 Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

dilipbiswal commented Sep 21, 2018

Uh oh!

cloud-fan commented Sep 20, 2018 •

edited

Loading

cloud-fan commented Sep 20, 2018 •

edited

Loading

mgaido91 commented Sep 20, 2018 •

edited

Loading

cloud-fan commented Sep 20, 2018 •

edited

Loading

dilipbiswal commented Sep 20, 2018 •

edited

Loading

cloud-fan commented Sep 21, 2018 •

edited

Loading

mgaido91 commented Sep 21, 2018 •

edited

Loading