Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation #29458

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Aug 18, 2020

What changes were proposed in this pull request?

Add migration guide for decimal value overflow behavior in sum aggregation, introduced in #29026

Why are the changes needed?

Add migration guide for the behavior changes from 3.0 to 3.1.
See also: #29450 (comment)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Build docs and preview:
image

@SparkQA
Copy link

SparkQA commented Aug 18, 2020

Test build #127528 has finished for PR 29458 at commit ac10d8e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -36,6 +36,10 @@ license: |

- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

- In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column:
- If it is hash aggregation with `group by` clause, a runtime exception is thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not many users know the physical nodes. How about

In Spark 3.1, Spark always returns null if the sum of decimal overflows under non-ANSI
mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum of decimal may
fail at runtime under non-ANSI mode (when the query has GROUP BY and is planned as hash aggregate)

@@ -36,6 +36,10 @@ license: |

- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

- In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column:
- If it is hash aggregation with `group by` clause, a runtime exception is thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not many users know the physical nodes. How about

In Spark 3.1, Spark always returns null if the sum of decimal overflows under non-ANSI
mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum of decimal may
fail at runtime under non-ANSI mode (when the query has GROUP BY and is planned as hash aggregate)

Copy link
Member Author

@gengliangwang gengliangwang Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "non-ANSI mode" is a bit wired.
Also, we have to mention that Spark 3.0 or earlier returns null under certain conditions.

Copy link
Contributor

@cloud-fan cloud-fan Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use "default mode".

I don't see a difference between "may fail at runtime" or "may return null". They are mutually exclusive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated the doc and screenshot

@SparkQA
Copy link

SparkQA commented Aug 18, 2020

Test build #127559 has finished for PR 29458 at commit a6b03f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -36,6 +36,8 @@ license: |

- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

- In Spark 3.1, when `spark.sql.ansi.enabled` is false, Spark always returns null if the sum of decimal type column overflows. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false, the sum of decimal type column may return null or incorrect result, or even fails at runtime (depending on the actual query plan execution).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we need to describe spark.sql.ansi.enabled is false two times? I think its okay just to describe it like this;

In Spark 3.0 or earlier, the sum of...

or

In Spark 3.0 or earlier, in the case, the sum of...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Thanks

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the one minor comment.

@gengliangwang
Copy link
Member Author

Merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants