Skip to content

Conversation

linhongliu-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a conflict between RewriteDistinctAggregates and DecimalAggregates.
In some cases, DecimalAggregates will wrap the decimal column to UnscaledValue using
different rules for different aggregates.

This means, same distinct column with different aggregates will change to different distinct columns
after DecimalAggregates. For example:
avg(distinct decimal_col), sum(distinct decimal_col) may change to
avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)

We assume after RewriteDistinctAggregates, there will be at most one distinct column in aggregates,
but DecimalAggregates breaks this assumption. To fix this, we have to switch the order of these two
rules.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

no

How was this patch tested?

added test cases

Authored-by: Linhong Liu linhong.liu@databricks.com
Signed-off-by: Wenchen Fan wenchen@databricks.com
(cherry picked from commit 40ef5c9)

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129826 has finished for PR 30053 at commit b043f94.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34429/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34429/

@linhongliu-db
Copy link
Contributor Author

This PR depends on #30052 to pass tests

…t DECIMAL columns

This PR fixes a conflict between `RewriteDistinctAggregates` and `DecimalAggregates`.
In some cases, `DecimalAggregates` will wrap the decimal column to `UnscaledValue` using
different rules for different aggregates.

This means, same distinct column with different aggregates will change to different distinct columns
after `DecimalAggregates`. For example:
`avg(distinct decimal_col), sum(distinct decimal_col)` may change to
`avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)`

We assume after `RewriteDistinctAggregates`, there will be at most one distinct column in aggregates,
but `DecimalAggregates` breaks this assumption. To fix this, we have to switch the order of these two
rules.

bug fix

no

added test cases

Closes apache#29673 from linhongliu-db/SPARK-32816.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 40ef5c9)
@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34481/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34481/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129876 has finished for PR 30053 at commit 2634588.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34486/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34487/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34487/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129881 has finished for PR 30053 at commit 2634588.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129882 has finished for PR 30053 at commit 96a0706.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db
Copy link
Contributor Author

@cloud-fan This PR depends on another 2 fixes in the master branch. Should we cherry-pick them?
#29026
#27627

@cloud-fan
Copy link
Contributor

ah, that two are hard to backport as they change streaming state store. I guess we can't backport this fix either.

@linhongliu-db
Copy link
Contributor Author

got it. Let's close it for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants