[Improve]Optimize Query Plan which can be sent to one datanode in fact.

Sometimes, a complex query are not a real complex query for query plan:

A. For exmaple : 
 

**schema.xml**  

```
<schema name="testdb">
<table name="sharding_two_node" dataNode="dn1,dn2" rule="two_node_hash"/>
<table name="sharding_two_node2" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>

```

**rule.xml**  

```
<tableRule name="two_node_hash">
	<rule>
		<columns>id</columns>
		<algorithm>two_node_hash</algorithm>
	</rule>
</tableRule>
<function name="two_node_hash" class="Hash">
	<property name="partitionCount">2</property>
	<property name="partitionLength">512</property>
</function>
```


* **steps：**    
    a query is that
` explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;`
* **expect result：**  
```
mysql> explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
| DATA_NODE | TYPE     | SQL/REF                                                                                                   |
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
| dn1       | BASE SQL | select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2 |
+-----------+----------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
```    

* **real result：**  

```
mysql> explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
| DATA_NODE       | TYPE          | SQL/REF                                                                                                              |
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
| dn1_0           | BASE SQL      | select `a`.`id`,`a`.`c_flag`,`a`.`c_decimal` from  `sharding_two_node` `a` where a.id = 1 ORDER BY `a`.`c_flag` ASC  |
| merge_1         | MERGE         | dn1_0                                                                                                                |
| shuffle_field_1 | SHUFFLE_FIELD | merge_1                                                                                                              |
| dn1_1           | BASE SQL      | select `b`.`id`,`b`.`c_flag`,`b`.`c_decimal` from  `sharding_two_node2` `b` where b.id = 2 ORDER BY `b`.`c_flag` ASC |
| merge_2         | MERGE         | dn1_1                                                                                                                |
| shuffle_field_3 | SHUFFLE_FIELD | merge_2                                                                                                              |
| join_1          | JOIN          | shuffle_field_1; shuffle_field_3                                                                                     |
| shuffle_field_2 | SHUFFLE_FIELD | join_1                                                                                                               |
+-----------------+---------------+----------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)
```

B. Now,we will try to optimize it.

These "Complex Queries" can be sent to one datanode :

1. explain select count(*) from sharding_two_node a where a.id =1;
2. explain select * from sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
3. explain select * from sharding_two_node a join testdb.sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 and b.id=2;
4. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where (a.id =1 and b.id=1) or (a.id =2 and b.id=2);
5. explain select * from sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1;
6. explain select * from sharding_two_node a join sharding_two_node2 b where a.id = b.id and a.id =1;
7. explain select * from sharding_two_node a join sharding_two_node2 b where  a.id =b.id and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 ));
8. explain select * from sharding_two_node a join sharding_two_node2 b where  a.id =b.id and a.c_decimal=1 and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 ));
9. explain select * from sharding_two_node a join sharding_two_node2 b where  a.id =b.id and (a.c_decimal=1 and (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 )));
10. explain select * from sharding_two_node where id =1 and c_flag = (select c_flag from sharding_two_node2 where id =1 );
11. explain select * from sharding_two_node a where a.id =1 and exists(select * from sharding_two_node2 b where a.c_flag=b.c_flag and b.id =1);
12. explain select * from sharding_two_node where id =1 union select * from sharding_two_node2 where id =1 ;



C. These queries are not able to send one datanode:
1. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where a.id =1 or b.id=1;
2. explain select * from sharding_two_node a join sharding_two_node2 b on a.c_flag=b.c_flag where (a.id =1 and b.id=1) or (a.id =513 and b.id=513);
3. explain select * from sharding_two_node a join sharding_two_node2 b where (a.id = b.id and a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 );
4. explain select * from sharding_two_node a join sharding_two_node2 b where  a.c_flag=b.c_flag and a.id =2 ;
5. explain select * from sharding_two_node a join sharding_two_node2 b where  a.id =b.id and (a.c_decimal=1 or (( a.id =1 and b.id=1) or ( a.c_flag=b.c_flag and a.id =2 )));
6. explain select * from sharding_two_node where  c_flag = (select c_flag from sharding_two_node2 where id =1 );
7. explain select * from sharding_two_node where id =1 union select * from sharding_two_node2;

D. if the schema.xml is like this:

```
<schema name="testdb">
<table name="sharding_two_node" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>
<schema name="testdb2">
<table name="sharding_two_node2" dataNode="dn1,dn2" rule="two_node_hash"/>
</schema>
```
These queries should be optimized:
1. explain select * from `testdb`.sharding_two_node a join testdb2.`sharding_two_node2` b on a.id = b.id where a.id =1 and b.id=1;
2. explain select * from testdb.sharding_two_node a join testdb2.sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
3. use testdb2;explain select * from testdb.sharding_two_node a join sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
4. use testdb;explain select * from sharding_two_node a join testdb2.sharding_two_node2 b on a.id = b.id where a.id =1 and b.id=1;
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Improve]Optimize Query Plan which can be sent to one datanode in fact. #752

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Improve]Optimize Query Plan which can be sent to one datanode in fact. #752

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions