Skip to content

[Improve]Optimize Query Plan which can be sent to one datanode in fact. #752

@yanhuqing666

Description

@yanhuqing666

Sometimes, a complex query are not a real complex query for query plan:

A. For exmaple :

schema.xml

<schema name="testdb">
<table name="sharding_two_node" dataNode="dn1,dn2" rule="two_node_hash"/>
<table name="sharding_two_node2" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>

rule.xml

<tableRule name="two_node_hash">
	<rule>
		<columns>id</columns>
		<algorithm>two_node_hash</algorithm>
	</rule>
</tableRule>
<function name="two_node_hash" class="Hash">
	<property name="partitionCount">2</property>
	<property name="partitionLength">512</property>
</function>
  • steps:
    a query is that
    explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
  • expect result:
mysql> explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
| DATA_NODE | TYPE     | SQL/REF                                                                                                   |
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
| dn1       | BASE SQL | select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2 |
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
  • real result:
mysql> explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
| DATA_NODE       | TYPE          | SQL/REF                                                                                                              |
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
| dn1_0           | BASE SQL      | select `a`.`id`,`a`.`c_flag`,`a`.`c_decimal` from  `sharding_two_node` `a` where a.id = 1 ORDER BY `a`.`c_flag` ASC  |
| merge_1         | MERGE         | dn1_0                                                                                                                |
| shuffle_field_1 | SHUFFLE_FIELD | merge_1                                                                                                              |
| dn1_1           | BASE SQL      | select `b`.`id`,`b`.`c_flag`,`b`.`c_decimal` from  `sharding_two_node2` `b` where b.id = 2 ORDER BY `b`.`c_flag` ASC |
| merge_2         | MERGE         | dn1_1                                                                                                                |
| shuffle_field_3 | SHUFFLE_FIELD | merge_2                                                                                                              |
| join_1          | JOIN          | shuffle_field_1; shuffle_field_3                                                                                     |
| shuffle_field_2 | SHUFFLE_FIELD | join_1                                                                                                               |
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

B. Now,we will try to optimize it.

These "Complex Queries" can be sent to one datanode :

  1. explain select count(*) from sharding_two_node a where a.id =1;
  2. explain select * from sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
  3. explain select * from sharding_two_node a join testdb.sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
  4. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where (a.id =1 and b.id=1) or (a.id =2 and b.id=2);
  5. explain select * from sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1;
  6. explain select * from sharding_two_node a join sharding_two_node2 b where a.id = b.id and a.id =1;
  7. explain select * from sharding_two_node a join sharding_two_node2 b where a.id =b.id and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 ));
  8. explain select * from sharding_two_node a join sharding_two_node2 b where a.id =b.id and a.c_decimal=1 and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 ));
  9. explain select * from sharding_two_node a join sharding_two_node2 b where a.id =b.id and (a.c_decimal=1 and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 )));
  10. explain select * from sharding_two_node where id =1 and c_flag = (select c_flag from sharding_two_node2 where id =1 );
  11. explain select * from sharding_two_node a where a.id =1 and exists(select * from sharding_two_node2 b where a.c_flag=b.c_flag and b.id =1);
  12. explain select * from sharding_two_node where id =1 union select * from sharding_two_node2 where id =1 ;

C. These queries are not able to send one datanode:

  1. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 or b.id=1;
  2. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where (a.id =1 and b.id=1) or (a.id =513 and b.id=513);
  3. explain select * from sharding_two_node a join sharding_two_node2 b where (a.id = b.id and a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 );
  4. explain select * from sharding_two_node a join sharding_two_node2 b where a.c_flag=b.c_flag and a.id =2 ;
  5. explain select * from sharding_two_node a join sharding_two_node2 b where a.id =b.id and (a.c_decimal=1 or (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 )));
  6. explain select * from sharding_two_node where c_flag = (select c_flag from sharding_two_node2 where id =1 );
  7. explain select * from sharding_two_node where id =1 union select * from sharding_two_node2;

D. if the schema.xml is like this:

<schema name="testdb">
<table name="sharding_two_node" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>
<schema name="testdb2">
<table name="sharding_two_node2" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>

These queries should be optimized:

  1. explain select * from testdb.sharding_two_node a join testdb2.sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
  2. explain select * from testdb.sharding_two_node a join testdb2.sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
  3. use testdb2;explain select * from testdb.sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
  4. use testdb;explain select * from sharding_two_node a join testdb2.sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions