Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4999] Fix ColumnarUnionExec to get PartitionerAwareUnionRDD used if child RDDs share same partitioner #5021

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

guixiaowen
Copy link
Contributor

…etter to be transformed to PartitionerAwareUnionRDD than UnionRDD when they has same partitioner. #4999

What changes were proposed in this pull request?

For example:

select * from test a
union all
select * from test b
union all
select * from test c

They have the same partitioner.

In ColumnarUnionExec, they will be transformed to rdd1.union(rdd2).union(rdd3).

After this pr,

In ColumnarUnionExec, they will be transformed to PartitionerAwareUnionRDD(sc, Seq(rdd1, rdd2, rdd3)) if they have the same partitioner.

(Fixes: #4999)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…etter to be transformed to PartitionerAwareUnionRDD than UnionRDD when they has same partitioner
Copy link

#4999

Copy link

Run Gluten Clickhouse CI

@Yohahaha
Copy link
Contributor

thank you for this fix, I have verified Gluten's union rdd is same as Spark's after this pr.

Spark
image

Gluten before
image

Gluten after
image

please refine pr title more clearly.

@Yohahaha Yohahaha requested a review from PHILO-HE March 20, 2024 02:26
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@PHILO-HE PHILO-HE changed the title [GLUTEN-4999]In ColumnarUnionExec, it would be better to be transformed to PartitionerAwareUnionRDD than UnionRDD when they has same partitioner [GLUTEN-4999] Fix ColumnarUnionExec to get PartitionerAwareUnionRDD used if child RDDs share same partitioner Mar 20, 2024
@Yohahaha Yohahaha merged commit c85db55 into apache:main Mar 20, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants