Skip to content

ESQL: Make PhysicalPlan.output() include all physically emitted columns/attributes #121549

Open
@alex-spies

Description

@alex-spies

For PhysicalPlan inheritors that represent commands that add columns, like

  • EvalExec
  • DissectExec
  • GrokExec
  • EnrichExec
  • LookupJoinExec
    the .output() method returns the logical output, which in case of name conflicts does not include conflicting attributes from the upstream plan.

E.g. for an index with 1 field idx_field the plan for FROM idx | EVAL idx_field = to_upper(idx_field) will have an EvalExec whose .output() method will include only the newly evaluated idx_field, not the original field from the index. This is facilitated by the helper method mergeOutputAttributes.

However, the corresponding, actual physical EvalOperator does not implement any handling of name conflicts; it simply appends blocks to incoming pages. The same is true for the physical operators corresponding to the other ...Exec classes from above.

The fact that name conflict logic spills into physical plans leads to complications. For instance, physical plans with remote ENRICHs sometimes require the presence of two columns with the same name: #118531

It also makes PhysicalPlans harder to reason about and doesn't correctly represent actual physical operations.

To have more simplicity in our query plans, we should change the contract for PhysicalPlan.output() to not return the logical output of the physical plan, but the actual physical output. Then #118531 will be solved, too.

This is also required if we later want to go one step further and remove name conflict handling from LogicalPlans (however that might look). Name conflicts only strictly need to be handled in the Analyzer, to resolve what each name in a query refers to.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions