Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Sep 1, 2023

Name binding stores caching information upfront. Rule interdependencies pushed me into fixing a bunch of other rules before tests would pass. All together I think most of the changes are simplifications that I was planning on doing related to the fixidx refactor. I was hoping to make it more piecemeal. Hopefully this gets us ~50% of the way towards removing those dependencies.

fixidx is mostly contained to reorderJoins and fixAuxiliaryExpressions now, both near the end of analysis. If we move the indexing in reorderJoins into fixAuxiliaryExpressions, all indexing will happen at the end of analysis. That would let us index complicated joins with subqueries correctly and all queries more reliably.

summary:

  • rewrite cacheability to use correlated column references
  • volatile functions now prevent caching
  • rewrite moveFiltersOutOfJoinConditions to put filters below join when appropriate
  • subquery decorrelation uses (and updates) correlated column references
  • alias subquery strings simplified to use the query string, not the plan string
  • fix jsonTable and lateral join analysis
  • fixAuxiliaryExpresssions at end of analysis
  • recursive analyzer rules (insert, trigger, procedure) are all at end of analysis now

@max-hoffman max-hoffman marked this pull request as ready for review September 2, 2023 05:08
Name binding will tag a scope's subquery metadata when resolving
and out of scope reference. Those references are used to perform
cacheability checks and subquery decorrelation.

Related fixes to indexing, joins, filter rules, and recursive analyzer
rule precedence (subquery, trigger, procedure, DML).
@max-hoffman max-hoffman force-pushed the max/cache-checks-use-expr-ids branch from 0ff3b10 to 61abfb3 Compare September 5, 2023 15:33
@max-hoffman max-hoffman requested a review from zachmu September 5, 2023 15:36
@max-hoffman
Copy link
Contributor Author

For a bit more context on the alias string rewriting, the query that was failing is select count((select * from (select pk from one_pk limit 1) as sq)) from one_pk;. The subquery in the count changes through analysis, and will no longer have the same plan string by indexing as it did during name binding. We currently use expression.String() to generate the schema for a projection, which will include the plan string of any nested subqueries. Changing the behavior of String() would affect all subquery plan nodes. When we rewrite fixidx, we can choose not to use node.String() as the way to match a subquery to its child schema (we could use ids), but that is out of scope for this PR. One alternative that is more work would include a third type of string formatter.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crazy about all the string comparison happening for column resolution, but I don't have a better idea. If this moves us in the right direction then let's get it in and improve on it

@@ -300,26 +300,6 @@ WHERE
" └─ columns: [c_id c_d_id c_w_id c_last c_credit c_discount]\n" +
"",
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this test go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing merge from main

" ├─ static: [{(NULL, ∞)}]\n" +
" └─ columns: [id dkcaj kng7t tw55n qrqxw ecxaj fgg57 zh72s fsk67 xqdyt tce7a iwv2h hpcms n5cc2 fhcyt etaq7 a75x7]\n" +
" ├─ columns: [pbmrx.id:4!null as id, pbmrx.TW55N:5!null as TEYBZ, pbmrx.ZH72S:6 as FB6N7]\n" +
" └─ LookupJoin\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned that the join types of many of these complex queries are changing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need statistics

}
ret, same, filters, err := recurseSubqueryForOuterFilters(n, a, scope)
ret, same, filters, newCorr, err := recurseSubqueryForOuterFilters(n, a, sql.ColSet{})
//if !newCorr.Empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

}

aggName := strings.ToLower(agg.String())
aggName := strings.ToLower(plan.AliasSubqueryString(agg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very skeptical this new method is being used in place of expression.String() everywhere it could matter

You have a plan out of this mess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expressions are given an expression id, the expression id is used for setting execution indexes

// not recursive
cteScope := b.buildSelectStmt(inScope, union)
sqScope := inScope.push()
sqScope.initSubquery()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably better expressed as inScope.pushSubquery()

There's basically no use case for calling this except immediatley after construction, so move it to a constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case is your next comment, i added a method to do both but left the init for cases where we want replace() not push()


rightInScope := inScope.replace()
rightInScope.addCte(name, cteScope)
rightInScope.initSubquery()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need inscope.replaceSubquery() here?

@max-hoffman max-hoffman merged commit ee73780 into main Sep 6, 2023
@max-hoffman max-hoffman deleted the max/cache-checks-use-expr-ids branch September 6, 2023 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants