Skip to content

Conversation

tyrone-yu
Copy link
Collaborator

No description provided.

@tyrone-yu tyrone-yu requested a review from a team September 26, 2025 03:45
Copy link
Contributor

Summary of Changes

Hello @tyrone-yu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the query optimizer by implementing a join reordering strategy that is sensitive to table ownership. By incorporating a PartyCode into table schemas, the optimizer can now prioritize joining tables belonging to the same owner, thereby reducing the cost associated with cross-owner joins. This feature aims to improve the efficiency of complex queries involving data from multiple parties by optimizing the execution plan based on data locality.

Highlights

  • Enhanced Schema with PartyCode: Introduced a PartyCode field to Schema and TableSchema structs to identify the owner of tables, enabling ownership-aware optimizations.
  • Join Reordering Optimization: Implemented a new joinReOrderSolver in the query optimizer, which uses a greedy algorithm to reorder joins.
  • Cost-Based Join Ordering: The reordering algorithm calculates join costs, distinguishing between 'LocalJoin' (same owner, lower cost) and 'NonLocalJoin' (different owners, higher cost) to prioritize efficient join execution.
  • New Utility Function: Added FilterOutInPlace in pkg/expression/util.go to assist in managing expressions during join condition processing.
  • Expanded Test Coverage: Included new test cases in cmd/regtest/testdata/two_parties.json, pkg/interpreter/translator/translator_ccl_input_for_test.go, and pkg/planner/core/testdata/runsql_in.json to validate the multi-party join reordering logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a join reordering optimization to prioritize local joins over cross-party joins. This is a valuable performance enhancement for distributed queries. The implementation adds PartyCode to schemas to track data locality and uses a greedy algorithm with a cost model that favors local joins. The changes are well-structured and include relevant tests.

I've found a critical issue in the join group extraction logic that could lead to incorrect query results by reordering non-inner joins, and a medium-severity performance issue in a utility function. Addressing these will make the implementation more robust and efficient.

@tyrone-yu tyrone-yu requested a review from a team September 26, 2025 08:51
Outputs: []
}
`, testConf{groupThreshold: 0, batched: true, revealGroupCount: false}},
{`select count(*) from alice.tbl_1 as t1 join alice.tbl_2 as t2 join bob.tbl_1 as t3 join bob.tbl_2 as t4 where t1.plain_int_0 = t2.plain_int_0 and t2.plain_int_0 = t3.plain_int_0 and t3.plain_int_0 = t4.plain_int_0`, `digraph G {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

例子里面是不是 t2 t3调整个位置验证下?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

// results in a join group {a, b, LeftJoin(c, d)}.
func extractJoinGroup(p LogicalPlan) (group []LogicalPlan, eqEdges []*expression.ScalarFunction, otherConds []expression.Expression) {
join, isJoin := p.(*LogicalJoin)
if !isJoin || join.JoinType != InnerJoin {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请教下:这里会导致策略只对inner join 生效吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前就只支持 inner join,left/right join reorder 现在好像不能保证结果正确,先不支持

eqEdges: eqEdges,
}
p, err = groupSolver.solve(curJoinGroup)
// TODO: @xiaoyuan support Dp solver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请教一下,这里的dp solver本来是干嘛的呀

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 join reorder 就是通过计算 join 的 cost 去寻找估值最低的顺序,相当于现在使用的是暴露解法,还可以通过 dp 的思路优化耗时,不过在 join 的表数据较大的时候才会产生效果,现在用不上

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants