Description
For PhysicalPlan
inheritors that represent commands that add columns, like
- EvalExec
- DissectExec
- GrokExec
- EnrichExec
- LookupJoinExec
the.output()
method returns the logical output, which in case of name conflicts does not include conflicting attributes from the upstream plan.
E.g. for an index with 1 field idx_field
the plan for FROM idx | EVAL idx_field = to_upper(idx_field)
will have an EvalExec
whose .output()
method will include only the newly evaluated idx_field
, not the original field from the index. This is facilitated by the helper method mergeOutputAttributes.
However, the corresponding, actual physical EvalOperator
does not implement any handling of name conflicts; it simply appends blocks to incoming pages. The same is true for the physical operators corresponding to the other ...Exec
classes from above.
The fact that name conflict logic spills into physical plans leads to complications. For instance, physical plans with remote ENRICH
s sometimes require the presence of two columns with the same name: #118531
It also makes PhysicalPlans harder to reason about and doesn't correctly represent actual physical operations.
To have more simplicity in our query plans, we should change the contract for PhysicalPlan.output()
to not return the logical output of the physical plan, but the actual physical output. Then #118531 will be solved, too.
This is also required if we later want to go one step further and remove name conflict handling from LogicalPlan
s (however that might look). Name conflicts only strictly need to be handled in the Analyzer, to resolve what each name in a query refers to.