Closed
Description
Is your feature request related to a problem or challenge?
I extracted this from #13651 so it was more visible
- Part of [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) #13648
- Inspired by / copied from Introduce LogicalPlan invariants, begin automatically checking them #13651 from @wiedld
During upgrade, downstream systems often experience issues due to implicit changes (not explicit API changes) of LogicalPlans that DataFusion code begins relying on, and which result in unintended consequences when upgrading to a new version of DataFusion (see #13525).
Describe the solution you'd like
The idea is to make the current implicit assumptions ("Invariants" in more formal language)( explict and automatically check them.
Examples of implicit assumptions:
- Schema column names can't be repeated (this is explicitly mentioned on [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) #13525)
- Inputs to
UnionExec
must have the same schema - ...
Describe alternatives you've considered
I like the approach @wiedld took in #13651 :
- define the invariants
- check the invariants for extensible interfaces (which may be user defined)
- throw the error closer to the problem (rather than weird behavior later)
Additional context
Sub tasks:
- Define infrastructure to check LP invariant. PR: Introduce LogicalPlan invariants, begin automatically checking them #13651
- Define infrastructure to check physical plan invariants: WIP: Proposed interface for physical plan invariant checking. #13986
- Define infrastructure for user-defined invariants. See issue: Define extension API for user-defined invariants. #14029