Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(engine): Add first stage of physical query planning #16891

Merged
merged 10 commits into from
Mar 26, 2025

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Mar 25, 2025

What this PR does / why we need it:

Planning consists of two stages:

  1. Conversion stage:
    Instructions from the logical plan are converted into Nodes in the physical plan.
    Most instructions can be translated into a single Node. However, MakeTable translates into multiple DataObjScan nodes.
  2. Pushdown stage:
    a) Push down the limit of the Limit node to the DataObjScan nodes.
    b) Push down filter predicates from the Filter nodes to the DataObjScan nodes.

This PR implements only the first stage.

chaudum added 5 commits March 25, 2025 10:03
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
rather than properties

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
so it can be used by the planner

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
The planning of the physical plan happens in two stages:
1. Convert instructions of the logical plan into nodes of the physical
   plan. Most instructions translate to a single node.
2. Push down filter predicates and limits.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum force-pushed the chaudum/physical-planner-v2 branch from 616eac9 to d333d8a Compare March 25, 2025 09:03
@chaudum chaudum marked this pull request as ready for review March 25, 2025 10:20
@chaudum chaudum requested a review from a team as a code owner March 25, 2025 10:20
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the discussions offline, I think this is quite clean!

Comment on lines 66 to 84
BinaryOpEq // Equality comparison (==).
BinaryOpNeq // Inequality comparison (!=).
BinaryOpGt // Greater than comparison (>).
BinaryOpGte // Greater than or equal comparison (>=).
BinaryOpLt // Less than comparison (<).
BinaryOpLte // Less than or equal comparison (<=).
BinaryOpAnd // Logical AND operation (&&).
BinaryOpOr // Logical OR operation (||).
BinaryOpXor // Logical XOR operation (^).
BinaryOpNot // Logicaal NOT operation (!).
BinaryOpAdd // Addition operation (+).
BinaryOpSub // Subtraction operation (-).
BinaryOpMul // Multiplication operation (*).
BinaryOpDiv // Division operation (/).
BinaryOpMod // Modulo operation (%).
BinaryOpMatchStr // String matching comparision (|=).
BinaryOpNotMatchStr // String not-matching comparison (!=).
BinaryOpMatchRe // Regex matching comparison (=~).
BinaryOpNotMatchRe // Regex not-matching comparison (!~).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be replaced with types.BinOpKind? Same comment for UnaryOpType.

Comment on lines 32 to 43
for _, inst := range lp.Instructions {
switch inst := inst.(type) {
case *logical.Return:
nodes, err := p.process(inst.Value)
if err != nil {
return err
}
if len(nodes) > 1 {
return errors.New("plan has more than 1 return value")
}
}
}
Copy link
Member

@rfratto rfratto Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behave properly if the logical plan incorrectly has two return instructions? Anything after the first return should be ignored

Comment on lines 47 to 49
// Plan returns the built physical [Plan]. Requires to run [Plan.Build] first,
// otherwise it will return nil.
func (p *Planner) Plan() *Plan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api design nit: Should Plan.Build return the plan so you don't have to have a separate getter?

func (p *Planner) processSelect(_ *logical.Select) error {
return nil
// Convert a stream selector from the [logical.MakeTable] instruction into an [Expression].
func (p *Planner) convertSelector(inst logical.Value) Expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, convertSelector and convertPredicate seem to do the same thing, minus convertPredicate having the unary op.

Should we roll them up into a single function to reduce duplication? That would also allow the caller to say that a unary op is invalid for a selector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the UnaryOp in the selector is prevented by the LogQL syntax.


// Create builds a physical plan from a logical plan using the provided context.
// It returns the built plan or an error if the build process fails.
func Create(plan *logical.Plan, ctx *Context) (*Plan, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api nit: should Context go before plan? I think idiomatically, parameters are typically ordered from most broad to most specific.

Copy link
Member

@rfratto rfratto Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or maybe we don't need the convenience function at all?)

@@ -0,0 +1,32 @@
package types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Go package naming usually discourages naming a package types like this:

Don’t use a single package for all your APIs. Many well-intentioned programmers put all the interfaces exposed by their program into a single package named api, types, or interfaces, thinking it makes it easier to find the entry points to their code base. This is a mistake. Such packages suffer from the same problems as those named util or common, growing without bound, providing no guidance to users, accumulating dependencies, and colliding with other imports. Break them up, perhaps using directories to separate public packages from implementation.

I think it's fine for now since it's internal, but we should keep an eye out for opportunities to organize by domain. Some things like ColumnType might be better for the schema package, but I'm not sure about the other types yet.

Comment on lines +19 to +20
catalog Catalog
plan *Plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity is the only planning context the catalog and will plan be reused? I'm asking because all functions could be pure. Only processMakeTable seems to need the Planner struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now that's the case. The idea with plan was to re-use it in the second stage of planning for the filter/limit pushdown. That could possibly also be done in pure functions, though.

chaudum added 2 commits March 26, 2025 09:06
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum merged commit efe1548 into main Mar 26, 2025
62 checks passed
@chaudum chaudum deleted the chaudum/physical-planner-v2 branch March 26, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants