-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder #25016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28200][SQL] Decimal overflow handling in ExpressionEncoder #25016
Conversation
@mickjermsurawong-stripe, I think we also had a test case for |
jenkins this is ok to test |
testOverflowingBigNumeric(new BigInteger("9" * 100), "java very big int") | ||
|
||
private def testOverflowingBigNumeric[T: TypeTag](bigDecimal: T, testName: String): Unit = { | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seq("true", "false").foreach
Test build #107048 has finished for PR 25016 at commit
|
Test build #107052 has finished for PR 25016 at commit
|
encodeDecodeTest(new java.math.BigDecimal(("9" * 20) + "." + "9" * 18), | ||
"java decimal within precision/scale limit") | ||
|
||
encodeDecodeTest(BigDecimal(("9" * 20) + "." + "9" * 18).unary_-, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use -BigDecimal...
instead of .unary_-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM, just some style comments on the tests. Thanks.
Anyway, may you please also update the description of the PR? Because actually, the point here is not that we are not honoring the newly introduced flag, but previously we were not checking/handling at all the overflow in these situations. So I think we best should make this more clear in the PR description. Thanks.
} | ||
} | ||
|
||
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> true.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> true.toString) { | |
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> "true") { |
} | ||
|
||
private def testDecimalOverflow(schema: StructType, row: Row): Unit = { | ||
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> false.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> false.toString) { | |
withSQLConf(SQLConf.DECIMAL_OPERATIONS_NULL_ON_OVERFLOW.key -> "false") { |
@@ -162,6 +162,32 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest { | |||
assert(row.toSeq(schema).head == decimal) | |||
} | |||
|
|||
test("RowEncoder should respect nullOnOverflow for decimals") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may you please also add the Jira number to the test case? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. thank you Marco!
Test build #107062 has finished for PR 25016 at commit
|
Test build #107081 has finished for PR 25016 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
ExpressionEncoder
does not handle bigdecimal overflow. Round-tripping overflowing java/scala BigDecimal/BigInteger returns null.changePrecision
will be false and row has null value.spark/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
Lines 202 to 206 in 24e1e41
ExpressionEncoder
to throw when detecting overflowing BigDecimal/BigInteger before its corresponding Decimal gets written to Row. This gives a consistent behavior between decimal arithmetic on sql expression (DecimalPrecision), and getting decimal from dataframe (RowEncoder)Thanks to @mgaido91 for the very first PR
SPARK-23179
and follow-up discussion on this change.Thanks to @JoshRosen for working with me on this.
How was this patch tested?
added unit tests