Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental: Implement initial framework for cost-based optimizations to avoid moving to Comet in some cases #618

Closed
wants to merge 4 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jul 1, 2024

Which issue does this PR close?

Part of #571

Rationale for this change

This is a first step towards a cost-based optimizer that can avoid moving from Spark to Comet in some cases where the benefit of moving to Comet could be outweighed by the overhead of the surrounding R2C and C2R transitions.

In this case we use a very specific rule to avoid moving to Comet for the final query stage if that query stage is just a SortExec (as we currently see with TPC-H q16).

Note that the actual performance difference of the optimization in this case is negligible because the final query stage is only processing a relatively small number of rows, but the value of this PR is getting the basic mechanism in place for future optimizations.

Before

Screenshot from 2024-07-01 11-21-12

After

Screenshot from 2024-07-01 11-21-42

What changes are included in this PR?

How are these changes tested?

@andygrove andygrove changed the title Implement optimization to avoid moving to Comet just for final Sort feat: Implement optimization to avoid moving to Comet just for final Sort Jul 1, 2024
@andygrove andygrove marked this pull request as ready for review July 1, 2024 17:40
@andygrove andygrove marked this pull request as draft July 1, 2024 18:17
@andygrove andygrove marked this pull request as ready for review July 1, 2024 19:20
@andygrove andygrove changed the title feat: Implement optimization to avoid moving to Comet just for final Sort feat: Implement initial framework for cost-based optimizations to avoid moving to Comet in some cases Jul 1, 2024
@andygrove andygrove marked this pull request as draft July 2, 2024 15:04
@andygrove
Copy link
Member Author

Moving to draft until I have unit tests

// fall back for sort amd exchange operators
val fallbackReason = "avoid move to Comet just for sort"
plan.setTagValue(CometExplainInfo.CBO_FALLBACK, fallbackReason)
plan.children.head.setTagValue(CometExplainInfo.CBO_FALLBACK, fallbackReason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the fallbackReason for the plan node should be enough I think. No need to set it for the child node.

@andygrove andygrove changed the title feat: Implement initial framework for cost-based optimizations to avoid moving to Comet in some cases experimental: Implement initial framework for cost-based optimizations to avoid moving to Comet in some cases Aug 15, 2024
@andygrove andygrove closed this Aug 15, 2024
@andygrove
Copy link
Member Author

There is no immediate benefit in merging this so I will close this issue for now. I wrote up my findings in #833 and may revisit this in the future

@andygrove andygrove reopened this Aug 15, 2024
@andygrove andygrove closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants