-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
optimizerOptimizer rulesOptimizer rules
Description
The basic assumption that for a given operator we can recompute its schema from inputs' schema is unsound.
- metadata: for plans constructed from SQL metadata will usually be empty, but an application can attach additional metadata to schema or field. The metadata can be assigned on the relational operator (its schema or one of the fields) and may not be derivable from inputs.
- for examples of metadata ussage see Change
ReturnTypeInfoto return aFieldrather thanDataType#14247, Support Extension Types / User Defined Types in DataFusion #12644, but also other, non-type related use-cases, like primary ID tracking
- for examples of metadata ussage see Change
- field qualification: a plan node may have field qualification retained from inputs or erased, or reassigned. At the optimizer time, we cannot simply assume one way or the other.
- DataFusion deals with plans created by it's own frontend, but DataFusion is also a library. It also deals with plans constructed by other frontends ([Epic] Make DataFusion a reliable foundation for building query engines #12723). Optimizers need to take any valid plan and produce a valid plan.
The usage of recompute_schema within optimizer should be replaced with explicit node schema updates.
For example, when pruning inputs with RequiredIndices, the node's schema should be pruned the same way, not recomputed anew.
The usage of recompute_schema within analyzer is left for a different issue.
askalt, alamb and getChan
Metadata
Metadata
Assignees
Labels
optimizerOptimizer rulesOptimizer rules