Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute multiple one-line nested-loop joins first #52399

Open
satanson opened this issue Oct 28, 2024 · 0 comments
Open

Execute multiple one-line nested-loop joins first #52399

satanson opened this issue Oct 28, 2024 · 0 comments
Labels
good first issue Good for newcomers type/enhancement Make an enhancement to StarRocks

Comments

@satanson
Copy link
Contributor

Enhancement

For query as follows:

prepare data

create table T (a int, b int, c int) properties("replication_num"="1");
insert into T values(1,1,1),(2,2,2),(3,3,3);
create table T0  properties("replication_num"="1")  as select * from T;
create table T1  properties("replication_num"="1")  as select * from T;
create table T2  properties("replication_num"="1")  as select * from T;

query

select *  from T where T.a = (select a from T0) or T.b = (select b from T1) or T.c = (select c from T2);

The plan would contains 3 nested-loop join, and TableScan T would be in the left side and join T0, T1 and T3 for three times, if T is a large table, the cost of memory copy during concat bi-lateral columns is very heavy.

+---------------------------------------------------------------------------------------------------+
| Explain String                                                                                    |
+---------------------------------------------------------------------------------------------------+
| - Output => [1:a, 2:b, 3:c]                                                                       |
|     - NESTLOOP/INNER JOIN [1:a = 7:a OR 2:b = 11:b OR 3:c = 14:c] => [1:a, 2:b, 3:c]              |
|             Estimates: {row: 3, cpu: 480.00, memory: 800.00, network: 0.00, cost: 336005244.00}   |
|         - NESTLOOP/CROSS JOIN => [1:a, 2:b, 3:c, 7:a, 11:b]                                       |
|                 Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 3.36003342E8}            |
|                 11:b := 9:b                                                                       |
|             - NESTLOOP/CROSS JOIN => [1:a, 2:b, 3:c, 7:a]                                         |
|                     Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 1.4400168E8}         |
|                     7:a := 4:a                                                                    |
|                 - SCAN [T] => [1:a, 2:b, 3:c]                                                     |
|                         Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 18.0}            |
|                         partitionRatio: 1/1, tabletRatio: 2/2                                     |
|                 - EXCHANGE(BROADCAST)                                                             |
|                         Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 62.0}            |
|                     - ASSERT LE 1                                                                 |
|                             Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 30.0}        |
|                         - EXCHANGE(GATHER)                                                        |
|                                 Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 30.0}    |
|                             - SCAN [T0] => [4:a]                                                  |
|                                     Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 6.0} |
|                                     partitionRatio: 1/1, tabletRatio: 2/2                         |
|             - EXCHANGE(BROADCAST)                                                                 |
|                     Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 62.0}                |
|                 - ASSERT LE 1                                                                     |
|                         Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 30.0}            |
|                     - EXCHANGE(GATHER)                                                            |
|                             Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 30.0}        |
|                         - SCAN [T1] => [9:b]                                                      |
|                                 Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 6.0}     |
|                                 partitionRatio: 1/1, tabletRatio: 2/2                             |
|         - EXCHANGE(BROADCAST)                                                                     |
|                 Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 62.0}                    |
|             - ASSERT LE 1                                                                         |
|                     Estimates: {row: 1, cpu: ?, memory: ?, network: ?, cost: 30.0}                |
|                 - EXCHANGE(GATHER)                                                                |
|                         Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 30.0}            |
|                     - SCAN [T2] => [14:c]                                                         |
|                             Estimates: {row: 3, cpu: ?, memory: ?, network: ?, cost: 6.0}         |
|                             partitionRatio: 1/1, tabletRatio: 2/2                                 |
+---------------------------------------------------------------------------------------------------+

We can optimized this query in two ways:

  1. In optimizer, we can generate a plan that compute T X (T0 X T1 X T2) instead of ((T X T0) X T1) X T2, in this way, memory copy would be reduced to 1/3;
  2. optimize one-line nested-loop join to eliminate memory copy of left side columns via just extending and appending right side columns to the result chunk of the nested-loop join.
@satanson satanson added type/enhancement Make an enhancement to StarRocks good first issue Good for newcomers labels Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers type/enhancement Make an enhancement to StarRocks
Projects
None yet
Development

No branches or pull requests

1 participant