-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns #30053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #129826 has finished for PR 30053 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
This PR depends on #30052 to pass tests |
…t DECIMAL columns This PR fixes a conflict between `RewriteDistinctAggregates` and `DecimalAggregates`. In some cases, `DecimalAggregates` will wrap the decimal column to `UnscaledValue` using different rules for different aggregates. This means, same distinct column with different aggregates will change to different distinct columns after `DecimalAggregates`. For example: `avg(distinct decimal_col), sum(distinct decimal_col)` may change to `avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)` We assume after `RewriteDistinctAggregates`, there will be at most one distinct column in aggregates, but `DecimalAggregates` breaks this assumption. To fix this, we have to switch the order of these two rules. bug fix no added test cases Closes apache#29673 from linhongliu-db/SPARK-32816. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 40ef5c9)
b043f94
to
2634588
Compare
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #129876 has finished for PR 30053 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #129881 has finished for PR 30053 at commit
|
Test build #129882 has finished for PR 30053 at commit
|
@cloud-fan This PR depends on another 2 fixes in the master branch. Should we cherry-pick them? |
ah, that two are hard to backport as they change streaming state store. I guess we can't backport this fix either. |
got it. Let's close it for now |
What changes were proposed in this pull request?
This PR fixes a conflict between
RewriteDistinctAggregates
andDecimalAggregates
.In some cases,
DecimalAggregates
will wrap the decimal column toUnscaledValue
usingdifferent rules for different aggregates.
This means, same distinct column with different aggregates will change to different distinct columns
after
DecimalAggregates
. For example:avg(distinct decimal_col), sum(distinct decimal_col)
may change toavg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)
We assume after
RewriteDistinctAggregates
, there will be at most one distinct column in aggregates,but
DecimalAggregates
breaks this assumption. To fix this, we have to switch the order of these tworules.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
no
How was this patch tested?
added test cases
Authored-by: Linhong Liu linhong.liu@databricks.com
Signed-off-by: Wenchen Fan wenchen@databricks.com
(cherry picked from commit 40ef5c9)