[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

ulysses-you · 2022-04-19T05:00:08Z

What changes were proposed in this pull request?

Add a new mix in interface SupportsReportDistinctKeys for datasource v2
Add a new method reportDistinctKeysSet in LeafNode
Override reportDistinctKeysSet in datasource v2 relation
Propagate reportDistinctKeysSet at DistinctKeysVisitor

Why are the changes needed?

Datasource v2 can be used to connect to some databases who support unique key.

Spark catalyst optimizer support do further optimization through distinct keys. So it can improve the performance if the Scan reports its distinct keys to Spark.

We already have several optimizer rules for distinct keys, for example:

We also have some prs which is in progress related distinct keys, for example:

Does this PR introduce any user-facing change?

yes, a new interface added for developer

How was this patch tested?

add test

ulysses-you · 2022-04-19T06:11:03Z

cc @cloud-fan @sigmod @wangyum

sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportUniqueKeys.java

ulysses-you · 2022-07-29T02:42:31Z

cc @cloud-fan @huaxingao if you have time to take a look, thank you

github-actions · 2022-11-07T00:21:46Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the SQL label Apr 19, 2022

HyukjinKwon reviewed Apr 20, 2022

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportUniqueKeys.java Outdated Show resolved Hide resolved

ulysses-you force-pushed the SPARK-38932 branch 2 times, most recently from 34f72b2 to d8430c5 Compare April 27, 2022 12:35

Datasource v2 support report distinct keys

d412f8c

ulysses-you force-pushed the SPARK-38932 branch from d8430c5 to d412f8c Compare July 29, 2022 02:32

ulysses-you changed the title ~~[SPARK-38932][SQL] Datasource v2 support report unique keys~~ [SPARK-38932][SQL] Datasource v2 support report distinct keys Jul 29, 2022

github-actions bot added the Stale label Nov 7, 2022

github-actions bot closed this Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

Uh oh!

ulysses-you commented Apr 19, 2022 •

edited

Loading

Uh oh!

ulysses-you commented Apr 19, 2022

Uh oh!

Uh oh!

ulysses-you commented Jul 29, 2022

Uh oh!

github-actions bot commented Nov 7, 2022

Uh oh!

Uh oh!

[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

[SPARK-38932][SQL] Datasource v2 support report distinct keys #36253

Uh oh!

Conversation

ulysses-you commented Apr 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Apr 19, 2022

Uh oh!

Uh oh!

ulysses-you commented Jul 29, 2022

Uh oh!

github-actions bot commented Nov 7, 2022

Uh oh!

Uh oh!

ulysses-you commented Apr 19, 2022 •

edited

Loading