Skip to content

[multistage] enhance physical plan explain #11272

@walterddr

Description

@walterddr

currently, "physical plan explain" added in #11052 will create a dag-like structure with all the workerIDs printed out
but there are still several improvements we can add:

  1. sometimes it only prints the first one due to repetition for example
  2. it doesn't have details regarding the logical node (such as project columns etc)
[0]@192.168.1.108:56541 MAIL_RECEIVE(RANDOM_DISTRIBUTED)
├── [1]@192.168.1.108:56595 MAIL_SEND(RANDOM_DISTRIBUTED)->{[0]@192.168.1.108@{56541,56541}|[0]} (Subtree Omitted)
└── [1]@192.168.1.108:56589 MAIL_SEND(RANDOM_DISTRIBUTED)->{[0]@192.168.1.108@{56541,56541}|[0]}
   └── [1]@192.168.1.108:56589 AGGREGATE_FINAL <---- this should really be on 2 servers
      └── [1]@192.168.1.108:56589 MAIL_RECEIVE(HASH_DISTRIBUTED)
         ├── [2]@192.168.1.108:56595 MAIL_SEND(HASH_DISTRIBUTED)->{[1]@192.168.1.108@{56595,56596}|[0],[1]@192.168.1.108@{56589,56590}|[1]} (Subtree Omitted) <---- subtree is omitted b/c they are the same except for the server/worker ID
         └── [2]@192.168.1.108:56589 MAIL_SEND(HASH_DISTRIBUTED)->{[1]@192.168.1.108@{56595,56596}|[0],[1]@192.168.1.108@{56589,56590}|[1]}  <---- mailbox send includes a list of receiving mailbox
            └── [2]@192.168.1.108:56589 AGGREGATE_LEAF <---- this should really be on 2 servers
               └── [2]@192.168.1.108:56589 JOIN <---- this should really be on 2 servers
                  ├── [2]@192.168.1.108:56589 MAIL_RECEIVE(HASH_DISTRIBUTED)
                  │   └── [3]@192.168.1.108:56589 MAIL_SEND(HASH_DISTRIBUTED)->{[2]@192.168.1.108@{56595,56596}|[0],[2]@192.168.1.108@{56589,56590}|[1]}
                  │      └── [3]@192.168.1.108:56589 PROJECT <---- missing project columns
                  │         └── [3]@192.168.1.108:56589 TABLE SCAN (A) null
                  └── [2]@192.168.1.108:56589 MAIL_RECEIVE(HASH_DISTRIBUTED)
                     └── [4]@192.168.1.108:56595 MAIL_SEND(HASH_DISTRIBUTED)->{[2]@192.168.1.108@{56595,56596}|[0],[2]@192.168.1.108@{56589,56590}|[1]}
                        └── [4]@192.168.1.108:56595 PROJECT
                           └── [4]@192.168.1.108:56595 TABLE SCAN (B) null

i would suggest

  • all nodes except mailbox send and mailbox receive shouldn't have server/worker info attached. e.g. only the stage/fragment ID
  • attach logical info to the nodes as well
  • as a side note, do not attach plan in error messages when execution failed. printing it out in the log should be suffice and it will make the error message much simpler comprehend

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions