fix: Fall back to Spark when hashing decimals with precision > 18 #1325

andygrove · 2025-01-22T22:07:08Z

Which issue does this PR close?

Closes #1294

Rationale for this change

Fix correctness issue

What changes are included in this PR?

Fall back to Spark when hashing decimals with precision > 18 when using Murmur3Hash or XXHash64.

How are these changes tested?

codecov-commenter · 2025-01-22T23:09:59Z

Codecov Report

Attention: Patch coverage is 88.23529% with 4 lines in your changes missing coverage. Please review.

Project coverage is 39.13%. Comparing base (f09f8af) to head (0eef007).
Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
...k/src/main/scala/org/apache/comet/serde/hash.scala	87.50%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #1325       +/-   ##
=============================================
- Coverage     56.12%   39.13%   -17.00%     
- Complexity      976     2065     +1089     
=============================================
  Files           119      262      +143     
  Lines         11743    60262    +48519     
  Branches       2251    12819    +10568     
=============================================
+ Hits           6591    23581    +16990     
- Misses         4012    32201    +28189     
- Partials       1140     4480     +3340

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andygrove · 2025-01-28T15:54:40Z

@parthchandra @wForget fyi

mbutrovich · 2025-01-28T16:42:37Z

Does this implicitly affect any data read that originated as uint64? I believe it gets converted to DECIMAL(20,0).

andygrove · 2025-01-28T17:10:46Z

Does this implicitly affect any data read that originated as uint64? I believe it gets converted to DECIMAL(20,0).

Yes, in the context of a user calling the hash or xxhash64 Spark SQL functions on that data. This PR does not change anything with respect to hashing as part of shuffle.

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

parthchandra · 2025-01-29T01:39:30Z

spark/src/main/scala/org/apache/comet/serde/hash.scala

+  def isSupportedType(expr: Expression): Boolean = {
+    for (child <- expr.children) {
+      child.dataType match {
+        case dt: DecimalType if dt.precision > 18 =>


From the issue it seemed that we get test failure even if the precision is less than 18. So do we want to fallback to Spark for all DecimalType values?

There was an earlier PR that implemented the correct behavior for the case where precision is < 18.

#1295

andygrove · 2025-01-29T19:07:17Z

Thanks for the review @kazuyukitanimura and @parthchandra

fall back to Spark when hashing decimals with precision > 18

28a8df4

andygrove mentioned this pull request Jan 28, 2025

Comet 0.5.1 Release #1300

Closed

andygrove added 4 commits January 28, 2025 08:40

upmerge

c3ba654

murmur3 checks

684afed

refactor

803971e

fix

1a90ba3

andygrove marked this pull request as ready for review January 28, 2025 15:53

andygrove self-assigned this Jan 28, 2025

andygrove requested a review from kazuyukitanimura January 28, 2025 15:53

kazuyukitanimura approved these changes Jan 28, 2025

View reviewed changes

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Outdated Show resolved Hide resolved

address feedback

0eef007

parthchandra reviewed Jan 29, 2025

View reviewed changes

kazuyukitanimura approved these changes Jan 29, 2025

View reviewed changes

andygrove merged commit e964947 into apache:main Jan 29, 2025
74 checks passed

andygrove deleted the hash-decimal branch January 29, 2025 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fall back to Spark when hashing decimals with precision > 18 #1325

fix: Fall back to Spark when hashing decimals with precision > 18 #1325

Uh oh!

andygrove commented Jan 22, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 22, 2025 •

edited

Loading

Uh oh!

andygrove commented Jan 28, 2025

Uh oh!

mbutrovich commented Jan 28, 2025

Uh oh!

andygrove commented Jan 28, 2025

Uh oh!

Uh oh!

parthchandra Jan 29, 2025

Uh oh!

andygrove Jan 29, 2025

Uh oh!

andygrove commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: Fall back to Spark when hashing decimals with precision > 18 #1325

fix: Fall back to Spark when hashing decimals with precision > 18 #1325

Uh oh!

Conversation

andygrove commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 28, 2025

Uh oh!

mbutrovich commented Jan 28, 2025

Uh oh!

andygrove commented Jan 28, 2025

Uh oh!

Uh oh!

parthchandra Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove commented Jan 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andygrove commented Jan 22, 2025 •

edited

Loading

codecov-commenter commented Jan 22, 2025 •

edited

Loading