-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32 #16350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@andygrove here are the SHA2 changes to be compliant with Spark. |
@getChan fyi |
@@ -192,7 +195,7 @@ pub fn spark_hex(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionErro | |||
|
|||
let hexed: StringArray = array | |||
.iter() | |||
.map(|v| v.map(hex_bytes).transpose()) | |||
.map(|v| v.map(|b| hex_bytes(b, true)).transpose()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contribution.
However, the hex function shuold not be modified.
@@ -192,7 +195,7 @@ pub fn spark_hex(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionErro | |||
|
|||
let hexed: StringArray = array | |||
.iter() | |||
.map(|v| v.map(hex_bytes).transpose()) | |||
.map(|v| v.map(|b| hex_bytes(b, true)).transpose()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that Spark's hex function is expected to return uppercase characters.
https://spark.apache.org/docs/latest/api/sql/index.html#hex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @getChan for pointing this out. In that case, I will make change for SHA2 hex representation only. Here is one example from spark-shell of SHA2,
scala> spark.sql("SELECT sha2('abc', 256) AS sha256_value").show(false)
+----------------------------------------------------------------+
|sha256_value |
+----------------------------------------------------------------+
|ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad|
+----------------------------------------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to the hex function did not cause any test failures ... does this mean we are missing some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have hex related test cases in Comet but they seems to be passing.
Looks like this is happening because Comet has its own hex-function, for which we call hex_bytes. But for hex-bytes lower case is set to false. Hence, no failure.
So, looks like the Datafusion's hex functionality is not getting leveraged and Comet is handling this on its own, hence we did not see regression on the Comet test-suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant that we should have seen test failures in DataFusion after changing the hex behavior. It looks like we should add an slt test for hex as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have test in Datafusion also datafusion/sqllogictest/test_files/spark/math/hex.slt but my PR changes the cases to lower. I will have to revert them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. I totally missed that when reviewing.
Thanks @andygrove / @getChan for the feedback. Please review the changes. |
FYI @shehabgamin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring, improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rishvin @shehabgamin and @getChan
Which issue does this PR close?
Rationale for this change
Please see here.
What changes are included in this PR?
Int32
type for sha2bit_length
argument.SparkSha2
response to return hex output in lowercase.SparkSha2
to not throw if the bit-length is not supported and instead return NULL.hash/sha2.slt
accordingly.Are these changes tested?
Tested the following ways,
hash/sha2.slt
with the updated changes.Are there any user-facing changes?
Yes, the sha2 function can be specified in the SQL.
The SHA2 output changes in 2 ways with this change,