-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-27482][SQL][WEBUI] Show BroadcastHashJoinExec numOutputRows statistics info on SparkSQL UI page #24389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… on SparkSQL UI page
I agree with the use case, but the implementation is too hacky. We need a general approach to propagate the statistics from logical plan to physical plan. |
Thanks for your comments. How about:
Your feedback is appreciated. @cloud-fan @dongjoon-hyun |
…from logical plan to physical plan
Propagate the statistics from logical plan to physical plan in Strategy.apply method instead |
ok to test |
Test build #105235 has finished for PR 24389 at commit
|
…ion strategy compability
@dongjoon-hyun @cloud-fan |
retest it please |
Test build #105279 has finished for PR 24389 at commit
|
Thanks @cloud-fan has created a more general approach |
Hi @pengbo, my PR just adds the foundation: physical plan knows the statistics of its corresponding logical plan. But we still need some work to make the statistics available in the SQL UI. We still need your PR after my PR is merged. |
Thanks for info, I will open another one when your PR is merged. |
It's pretty useful if we can convert a physical plan back to a logical plan, e.g., in apache#24389 This PR introduces a new feature to `TreeNode`, which allows `TreeNode` to carry some extra information via a mutable map, and keep the information when it's copied. The planner leverages this feature to put the logical plan into the physical plan. a test suite that runs all TPCDS queries and checks that some common physical plans contain the corresponding logical plans. Closes apache#24626 from cloud-fan/link. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Peng Bo <bo.peng1019@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
It's pretty useful if we can convert a physical plan back to a logical plan, e.g., in apache#24389 This PR introduces a new feature to `TreeNode`, which allows `TreeNode` to carry some extra information via a mutable map, and keep the information when it's copied. The planner leverages this feature to put the logical plan into the physical plan. a test suite that runs all TPCDS queries and checks that some common physical plans contain the corresponding logical plans. Closes apache#24626 from cloud-fan/link. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Peng Bo <bo.peng1019@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
Currently, the SparkSQL UI page shows only actual metric info in each SparkPlan node. However with
statistics
info may help us understand how the plan is designed and the reason it runs slowly. This PR is to showBroadcastHashJoinExec
numOutputRows
statistic
info on SparkSQL UI page first when it's available.The main changes:
stats
field inSparkPlan
BroadcastHashJoinExec
with thestats
inJoin
LogicalPlan
stats
field inSQLMetric
and show it on UI page when it's availableHow was this patch tested?
Regarding unit test has been added, manual UI test has been tested