Skip to content

[SPARK-38832][SQL][FOLLOWUP] Support propagate empty expression set for distinct key #36281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

  • Improve DistinctKeyVisitor that support propagate empty set
  • Small improvement for match alias

Why are the changes needed?

Make distinct keys can be used to optimize more case, see comment #36117 (comment)

Does this PR introduce any user-facing change?

Improve performance

How was this patch tested?

add test

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @sigmod @wangyum

@github-actions github-actions bot added the SQL label Apr 20, 2022
@ulysses-you ulysses-you force-pushed the SPARK-38832-followup branch from f2ceed8 to 1f0185f Compare April 21, 2022 01:37
Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we filter out empty distinctKey for safe?

agg.child.distinctKeys.exists(
_.subsetOf(ExpressionSet(ae.aggregateFunction.children.filterNot(_.foldable)))) =>

@cloud-fan
Copy link
Contributor

@wangyum what's wrong with that code?

@wangyum
Copy link
Member

wangyum commented Apr 21, 2022

For example:

Set[ExpressionSet](ExpressionSet()).exists(_.subsetOf(ExpressionSet(ae.aggregateFunction.children.filterNot(_.foldable))))

It is always true, it may have potential problems.

@ulysses-you
Copy link
Contributor Author

if we get an unexpected empty ExpressionSet it should be a bug .. shall we trust the framework ?

@cloud-fan
Copy link
Contributor

I don't see why empty expression set is special. If the framework has bugs and produces incorrect distinct keys, we will have query correctness bugs.

@wangyum
Copy link
Member

wangyum commented Apr 21, 2022

I'm ok if you do not think it is needed.

@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 21, 2022

My point is empty expression set is informative and we shouldn't skip it. It means the data is distinct by Nil, so it has at most 1 row and can override any other distinct keys.

Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 80929d6 Apr 22, 2022
@ulysses-you ulysses-you deleted the SPARK-38832-followup branch April 22, 2022 12:27
@ulysses-you
Copy link
Contributor Author

thank you @wangyum @cloud-fan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants