Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Jan 30, 2023

A join filter that evaluates to false or nil can still return rows in certain cases. One of the cases we evaluated incorrectly were LEFT_MERGE_JOINs with multiple filters, which should always return the left row even if a filter returns false. MERGE_JOIN's first filter is specially selected because its Left/Right expressions reference attributes for two tables that will arrive sorted given the indexes we chose to read. We use that first filter for the merge comparison direction.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be fine but it's pretty mysterious. Want to help me out?

filters := expression.SplitConjunction(j.Filter)
children = append(children, fmt.Sprintf("cmp: %s", filters[0]))
if len(filters) > 1 {
children = append(children, fmt.Sprintf("sel: %s", expression.JoinAnd(filters[1:]...)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing. I don't understand what cmp and sel mean in this context.

If the first predicate in the join condition is semantically meaningful, why doesn't the data model node reflect that? I.e. why isn't there are MergeJoinNode that has different semantics for its Filters?

Also, why JoinNode.Filter? Isn't is JoinNode.JoinCond?

var iter sql.RowIter = &mergeJoinIter{
left: l,
right: r,
filters: filters[1:],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merits a little explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like relying on a strict ordering of the predicates like this is fragile, who's to say we maintain their order throughout various transformations in the rest of the analyzer?

}

res, err := i.expr.Compare(ctx, i.fullRow)
func (i *mergeJoinIter) Next(ctx *sql.Context) (sql.Row, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe a little too gross to ship. Gotos really should be limited in their application to skip certain parts of nested loop structures, not the primary control flow. Unless you have a really, really good reason. Which as far as I can tell, you don't.

Structure this as a state machine inside a for loop, something like:

var nextState = init
for nextState != end {
  switch nextState {
    case init:
       ...
       nextState = exhaustCheck
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a custom type for your state machine states and consts for all possible values

@max-hoffman
Copy link
Contributor Author

@zachmu i added more comments to make merge's first filter less mysterious, and rewrote the goto join iters as switch statement state machines.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is easier to understand

@max-hoffman max-hoffman merged commit 8e0c643 into main Jan 31, 2023
@max-hoffman max-hoffman deleted the max/merge-join-nullable-filters branch January 31, 2023 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants