forked from pingcap/docs-cn
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
explain: add subqueries (pingcap#4912)
* explain: add subqueries * refine the language * Update explain-subqueries.md * Update explain-subqueries.md * Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com>
- Loading branch information
Showing
1 changed file
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
--- | ||
title: 用 EXPLAIN 查看使用子查询的执行计划 | ||
summary: 了解 TiDB 中 EXPLAIN 语句返回的执行计划信息。 | ||
--- | ||
|
||
# 用 EXPLAIN 查看使用子查询的执行计划 | ||
|
||
TiDB 会执行多种[子查询相关的优化](/subquery-optimization.md),以提升子查询的执行性能。本文档介绍一些常见子查询的优化方式,以及如何解读 `EXPLAIN` 语句返回的执行计划信息。 | ||
|
||
本文档所使用的示例表数据如下: | ||
|
||
```sql | ||
CREATE TABLE t1 (id BIGINT NOT NULL PRIMARY KEY auto_increment, pad1 BLOB, pad2 BLOB, pad3 BLOB, int_col INT NOT NULL DEFAULT 0); | ||
CREATE TABLE t2 (id BIGINT NOT NULL PRIMARY KEY auto_increment, t1_id BIGINT NOT NULL, pad1 BLOB, pad2 BLOB, pad3 BLOB, INDEX(t1_id)); | ||
CREATE TABLE t3 ( | ||
id INT NOT NULL PRIMARY KEY auto_increment, | ||
t1_id INT NOT NULL, | ||
UNIQUE (t1_id) | ||
); | ||
|
||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM dual; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; | ||
UPDATE t1 SET int_col = 1 WHERE pad1 = (SELECT pad1 FROM t1 ORDER BY RAND() LIMIT 1); | ||
INSERT INTO t3 SELECT NULL, id FROM t1 WHERE id < 1000; | ||
|
||
SELECT SLEEP(1); | ||
ANALYZE TABLE t1, t2, t3; | ||
``` | ||
|
||
## Inner join(无 `UNIQUE` 约束的子查询) | ||
|
||
以下示例中,`IN` 子查询会从表 `t2` 中搜索一列 ID。为保证语义正确性,TiDB 需要保证 `t1_id` 列的值具有唯一性。使用 `EXPLAIN` 可查看到该查询的执行计划去掉重复项并执行 `Inner Join` 内连接操作: | ||
|
||
```sql | ||
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t2); | ||
``` | ||
|
||
```sql | ||
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
| id | estRows | task | access object | operator info | | ||
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
| IndexMergeJoin_19 | 45.00 | root | | inner join, inner:TableReader_14, outer key:test.t2.t1_id, inner key:test.t1.id | | ||
| ├─HashAgg_38(Build) | 45.00 | root | | group by:test.t2.t1_id, funcs:firstrow(test.t2.t1_id)->test.t2.t1_id | | ||
| │ └─IndexReader_39 | 45.00 | root | | index:HashAgg_31 | | ||
| │ └─HashAgg_31 | 45.00 | cop[tikv] | | group by:test.t2.t1_id, | | ||
| │ └─IndexFullScan_37 | 90000.00 | cop[tikv] | table:t2, index:t1_id(t1_id) | keep order:false | | ||
| └─TableReader_14(Probe) | 1.00 | root | | data:TableRangeScan_13 | | ||
| └─TableRangeScan_13 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t2.t1_id], keep order:true | | ||
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
7 rows in set (0.00 sec) | ||
``` | ||
|
||
由上述查询结果可知,TiDB 首先执行 `Index Join` 索引连接(即 `Merge Join` 合并连接的变体)操作,开始读取 `t2.t1_id` 列的索引。先是 `└─HashAgg_31` 算子的部分任务在 TiKV 中对 `t1_id` 值进行去重,然后`├─HashAgg_38(Build)` 算子的部分任务在 TiDB 中对 `t1_id` 值再次进行去重。去重操作由聚合函数 `firstrow(test.t2.t1_id)` 执行,之后会将操作结果与 `t1` 表的主键相连接。 | ||
|
||
## Inner join(有 `UNIQUE` 约束的子查询) | ||
|
||
在上述示例中,为了确保 `t1_id` 值在与表 `t1` 连接前具有唯一性,需要执行聚合运算。在以下示例中,由于 `UNIQUE` 约束已能确保 `t3.t1_id` 列值的唯一: | ||
|
||
```sql | ||
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t3); | ||
``` | ||
|
||
```sql | ||
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
| id | estRows | task | access object | operator info | | ||
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
| IndexMergeJoin_20 | 999.00 | root | | inner join, inner:TableReader_15, outer key:test.t3.t1_id, inner key:test.t1.id | | ||
| ├─IndexReader_39(Build) | 999.00 | root | | index:IndexFullScan_38 | | ||
| │ └─IndexFullScan_38 | 999.00 | cop[tikv] | table:t3, index:t1_id(t1_id) | keep order:false | | ||
| └─TableReader_15(Probe) | 1.00 | root | | data:TableRangeScan_14 | | ||
| └─TableRangeScan_14 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t3.t1_id], keep order:true | | ||
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+ | ||
5 rows in set (0.00 sec) | ||
``` | ||
|
||
从语义上看,因为约束保证了 `t3.t1_id` 列值的唯一性,TiDB 可以直接执行 `INNER JOIN` 查询。 | ||
|
||
## Semi Join(关联查询) | ||
|
||
在前两个示例中,通过 `HashAgg` 聚合操作或通过 `UNIQUE` 约束保证子查询数据的唯一性之后,TiDB 才能够执行 `Inner Join` 操作。这两种连接均使用了 `Index Join`(`Merge Join` 的变体)。 | ||
|
||
下面的例子中,TiDB 优化器则选择了一种不同的执行计划: | ||
|
||
```sql | ||
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t2 WHERE t1_id != t1.int_col); | ||
``` | ||
|
||
```sql | ||
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+ | ||
| id | estRows | task | access object | operator info | | ||
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+ | ||
| MergeJoin_9 | 45446.40 | root | | semi join, left key:test.t1.id, right key:test.t2.t1_id, other cond:ne(test.t2.t1_id, test.t1.int_col) | | ||
| ├─IndexReader_24(Build) | 180000.00 | root | | index:IndexFullScan_23 | | ||
| │ └─IndexFullScan_23 | 180000.00 | cop[tikv] | table:t2, index:t1_id(t1_id) | keep order:true | | ||
| └─TableReader_22(Probe) | 56808.00 | root | | data:Selection_21 | | ||
| └─Selection_21 | 56808.00 | cop[tikv] | | ne(test.t1.id, test.t1.int_col) | | ||
| └─TableFullScan_20 | 71010.00 | cop[tikv] | table:t1 | keep order:true | | ||
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+ | ||
6 rows in set (0.00 sec) | ||
``` | ||
|
||
由上述查询结果可知,TiDB 执行了 `Semi Join`。不同于 `Inner Join`,`Semi Join` 仅允许右键 (`t2.t1_id`) 上的第一个值,也就是该操作将去除 `Join` 算子任务中的重复数据。`Join` 算法也包含 `Merge Join`,会按照排序顺序同时从左侧和右侧读取数据,这是一种高效的 `Zipper Merge`。 | ||
|
||
可以将原语句视为*关联子查询*,因为它引入了子查询外的 `t1.int_col` 列。然而,`EXPLAIN` 语句的返回结果显示的是[关联子查询去关联](/correlated-subquery-optimization.md)后的执行计划。条件 `t1_id != t1.int_col` 会被重写为 `t1.id != t1.int_col`。TiDB 可以从表 `t1` 中读取数据并且在 `└─Selection_21` 中执行此操作,因此这种去关联和重写操作会极大提高执行效率。 | ||
|
||
## Anti Semi Join (`NOT IN` 子查询) | ||
|
||
在以下示例中,*除非*子查询中存在 `t3.t1_id`,否则该查询将(从语义上)返回表 `t3` 中的所有行: | ||
|
||
```sql | ||
EXPLAIN SELECT * FROM t3 WHERE t1_id NOT IN (SELECT id FROM t1 WHERE int_col < 100); | ||
``` | ||
|
||
```sql | ||
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+ | ||
| id | estRows | task | access object | operator info | | ||
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+ | ||
| IndexMergeJoin_20 | 1598.40 | root | | anti semi join, inner:TableReader_15, outer key:test.t3.t1_id, inner key:test.t1.id | | ||
| ├─TableReader_28(Build) | 1998.00 | root | | data:TableFullScan_27 | | ||
| │ └─TableFullScan_27 | 1998.00 | cop[tikv] | table:t3 | keep order:false | | ||
| └─TableReader_15(Probe) | 1.00 | root | | data:Selection_14 | | ||
| └─Selection_14 | 1.00 | cop[tikv] | | lt(test.t1.int_col, 100) | | ||
| └─TableRangeScan_13 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t3.t1_id], keep order:true | | ||
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+ | ||
6 rows in set (0.00 sec) | ||
``` | ||
|
||
上述查询首先读取了表 `t3`,然后根据主键开始探测 (probe) 表 `t1`。连接类型是 _anti semi join_,即反半连接:之所以使用 _anti_,是因为上述示例有不存在匹配值(即 `NOT IN`)的情况;使用 `Semi Join` 则是因为仅需要匹配第一行后就可以停止查询。 |