Skip to content

[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Apr 19, 2022

What changes were proposed in this pull request?

  • Add a new mix in interface SupportsReportDistinctKeys for datasource v2
  • Add a new method reportDistinctKeysSet in LeafNode
  • Override reportDistinctKeysSet in datasource v2 relation
  • Propagate reportDistinctKeysSet at DistinctKeysVisitor

Why are the changes needed?

Datasource v2 can be used to connect to some databases who support unique key.

Spark catalyst optimizer support do further optimization through distinct keys. So it can improve the performance if the Scan reports its distinct keys to Spark.

We already have several optimizer rules for distinct keys, for example:

We also have some prs which is in progress related distinct keys, for example:

Does this PR introduce any user-facing change?

yes, a new interface added for developer

How was this patch tested?

add test

@github-actions github-actions bot added the SQL label Apr 19, 2022
@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @sigmod @wangyum

@ulysses-you ulysses-you force-pushed the SPARK-38932 branch 2 times, most recently from 34f72b2 to d8430c5 Compare April 27, 2022 12:35
@ulysses-you ulysses-you changed the title [SPARK-38932][SQL] Datasource v2 support report unique keys [SPARK-38932][SQL] Datasource v2 support report distinct keys Jul 29, 2022
@ulysses-you
Copy link
Contributor Author

cc @cloud-fan @huaxingao if you have time to take a look, thank you

@github-actions
Copy link

github-actions bot commented Nov 7, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Nov 7, 2022
@github-actions github-actions bot closed this Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants