-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation #29458
[SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation #29458
Conversation
Test build #127528 has finished for PR 29458 at commit
|
docs/sql-migration-guide.md
Outdated
@@ -36,6 +36,10 @@ license: | | |||
|
|||
- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. | |||
|
|||
- In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column: | |||
- If it is hash aggregation with `group by` clause, a runtime exception is thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not many users know the physical nodes. How about
In Spark 3.1, Spark always returns null if the sum of decimal overflows under non-ANSI
mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum of decimal may
fail at runtime under non-ANSI mode (when the query has GROUP BY and is planned as hash aggregate)
docs/sql-migration-guide.md
Outdated
@@ -36,6 +36,10 @@ license: | | |||
|
|||
- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. | |||
|
|||
- In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of decimal type column always returns `null` on decimal value overflow. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value overflow happens in sum aggregation of decimal type column: | |||
- If it is hash aggregation with `group by` clause, a runtime exception is thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not many users know the physical nodes. How about
In Spark 3.1, Spark always returns null if the sum of decimal overflows under non-ANSI
mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum of decimal may
fail at runtime under non-ANSI mode (when the query has GROUP BY and is planned as hash aggregate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "non-ANSI mode" is a bit wired.
Also, we have to mention that Spark 3.0 or earlier returns null under certain conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use "default mode".
I don't see a difference between "may fail at runtime" or "may return null". They are mutually exclusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have updated the doc and screenshot
Test build #127559 has finished for PR 29458 at commit
|
@@ -36,6 +36,8 @@ license: | | |||
|
|||
- In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. | |||
|
|||
- In Spark 3.1, when `spark.sql.ansi.enabled` is false, Spark always returns null if the sum of decimal type column overflows. In Spark 3.0 or earlier, when `spark.sql.ansi.enabled` is false, the sum of decimal type column may return null or incorrect result, or even fails at runtime (depending on the actual query plan execution). | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we need to describe spark.sql.ansi.enabled is false
two times? I think its okay just to describe it like this;
In Spark 3.0 or earlier, the sum of...
or
In Spark 3.0 or earlier, in the case, the sum of...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the one minor comment.
Merging to master |
What changes were proposed in this pull request?
Add migration guide for decimal value overflow behavior in sum aggregation, introduced in #29026
Why are the changes needed?
Add migration guide for the behavior changes from 3.0 to 3.1.
See also: #29450 (comment)
Does this PR introduce any user-facing change?
No
How was this patch tested?
Build docs and preview: