TopDown optimizing approach exploring and implementation based on cascades/columbia #51664
Description
Enhancement
For modern advanced HTAP databases, the cascades-based optimizing approach will make the entire optimizing framework more flexible and extensible (modular, easy for adding/updating rules, making catalog and cost model more adaptive), eliminating the redundant searching space by group-pruning/low-bound admission branch check/equiv class classification, without completely do enumeration of each logical plan node from the bottom up.
Details
Infrastructure
- implement a stack mechanism planner: add simple task stack for memo #51663
- Implement a serializing scheduling planner: add simple serializing scheduler #51866
- implement mutable and immutable fieldType planner: add mutable and immutable ft implementation #51916
- make the task dir clean out of memo planner: encapsulate task dir and move test related code to test file #52083
- separate the pattern logic of the memo directory planner: refactor pattern dir output memo related logic #52117
HashEqual infra
- add base hasher and equaler for taking in primitive type planner: add canonical hasher to take in primitive type directly for hashing. #55570
- regular the normal way to integrate HashEqual into base.LogicalPlan and expression.Expression planner: integrate hashEqual interface into LogicalPlan and expression.Expression. #55652
- introduce hashEquals to column/collationinfo/fieldType planner: introduce hashEquals for expression.Column/collationInfo/fieldType #55691
- introduce hashEqual interface for datum planner: introduce hashEqual interface for datum. #55727
- introduce hashEquals interface for expression.Expression planner: introduce hashEquals interface for expression.Expression #55793
Refactoring
The current plan/core
pkg is quite huge not as slim as we expected. The hybrid placement of logicalOp, physicalOp, property, task, logical-rewrite, build-phase, binder, cost, exhaustion, etc makes the hierarchy complicated. The boundary of them is also not as clear as we desired. I concluded that there is some reason for this phenomenon.
-
One is if Golang's structure wants to implement member functions with itself as a receiver, including the interface implementation, which can only be defined in the same pkg where the structure is defined. (here clearly the pkg is
core
) stackoverflow ref- eg:
LogicalSort
is defined in core pkg somewhere. If you want to implement (p *LogicalSort) buildKeyInfo for the sake of interface implementation or just member function of them, this implementation can only be done in the samecore
pkg, because Golang only allows local type as a receiver. Given buildKeyInfo is logical-rewrite-related code, it should be put into another part/component/pkg of core logic for clarity, that calls for the interface simplification or elimination of self-receiver's function.
- eg:
-
Numerous utility and facility functions related to the core logic are defined and used in the
core
package. These include trace, hashcode, cost mode, selectivity, task, enumeration, and more. Moving or adding a new feature outside thecore
package may result in many internal structures or functions being exported out as well. -
The biggest block is the import cycle problem. Say we create a new pkg called core2 and we want to add a new feature inside. The new feature couldn't work well without the current core context support, so by some means (exported original
core
structure or interface, using function pointer or something), we got a dependency fromcore2
->core
. However, this feature must also be called fromcore
, causecore
is a current unified portal for the entire optimization phase, so we got a dependency fromcore
->core
. As a result, hops, golang's import cycle is formed. The final solution to the problem is Back to integrate them all. Eventually, the core pkg becomes larger and larger. -
move logical optimizing trace logic out of core pkg planner: move logical optimizing trace logic out of core pkg #52161
-
planner: refine cop task as a capital one for latter pkg move planner: refine cop task as a capital one for latter pkg move #52506
-
planner: refine mppTask as capital one for latter pkg move planner: refine mppTask as capital one for latter pkg move #52491
-
planner: move physical opt and cost misc to util and split plan interface planner: move physical opt and cost misc to util and split plan interface #52224
-
planner: move base plan related output of core pkg and make it well-packaged planner: move base plan related output of core pkg and make it well-pkged #52529
-
planner: remove internal pkg and move base code to certain place planner: remove internal pkg and move base code to certain place #52620
-
planner: rename base plan implemention's pkg from base to baseImpl planner: rename base plan implemention's pkg from base to baseImpl #52659
-
planner: move debugtracer logic and handle col definition to util pkg planner: move debugtracer logic and handle col defition to util pkg #52681