Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information of hashAgg for memory control #6171

Merged
merged 6 commits into from
Aug 20, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions configure-memory-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,83 @@ The following example constructs a memory-intensive SQL statement that triggers
* `record path` indicates the directory of status files.

5. You can see a set of files in the directory of status files (In the above example, the directory is `/tmp/1000_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage/record`), including `goroutinue`, `heap`, and `running_sql`. These three files are suffixed with the time when status files are logged. They respectively record goroutine stack information, the usage status of heap memory, and the running SQL information when the alarm is triggered. For the format of log content in `running_sql`, refer to [`expensive-queries`](/identify-expensive-queries.md).

## Other tidb-server memory control behaviors
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

### Flow control

- TiDB supports dynamic memory control feature for the operator that reads data. By default, this operator enables the maximum number of threads that [`tidb_disql_scan_concurrency`](/system-variables.md#tidb_distsql_scan_concurrency) allows to read data. When the memory usage of a single SQL statement exceeds [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query) each time, the operator that reads data stops one thread.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

- This flow control behavior is controlled by the system variable [`tidb_enable_rate_limit_action`](/system-variables.md#tidb_enable_rate_limit_action).
- When the flow control behavior is triggered, TiDB outputs a log containing the key word `memory exceeds quota, destroy one token now`.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

### Disk spill

TiDB supports disk spill feature for execution operators. When the memory usage of a SQL statement exceeds the memory quota, tidb-server can spill the intermediate data of execution operators to the disk to relieve memory pressure. Operators supporting disk spill include Sort, MergeJoin, HashJoin, and HashAgg.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

- The disk spill action is jointly controlled by parameters [`mem-quota-query`](/tidb-configuration-file.md#mem-quota-query), [`oom-use-tmp-storage`](/tidb-configuration-file.md#oom-use-tmp-storage), [`tmp-storage-path`](/tidb-configuration-file.md#tmp-storage-path), and [`tmp-storage-quota`](/tidb-configuration-file.md#tmp-storage-quota).
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
- When the disk spill is triggered, TiDB outputs a log containing the key word `memory exceeds quota, spill to disk now` or `memory exceeds quota, set aggregate mode to spill-mode`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When the disk spill is triggered, TiDB outputs a log containing the key word `memory exceeds quota, spill to disk now` or `memory exceeds quota, set aggregate mode to spill-mode`.
- When the disk spill is triggered, TiDB outputs a log containing the key words `memory exceeds quota, spill to disk now` or `memory exceeds quota, set aggregate mode to spill-mode`.

- Disk spill for operators Sort, MergeJoin, and HashJoin is new in v4.0.0; disk spill for operator HashAgg is new in v5.2.0.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
- When SQL statements containing Sort, MergeJoin, or HashJoin cause OOM, TiDB triggers disk spill by default. When SQL statements containing HashAgg cause OOM, TiDB does not trigger disk spill by default. You can configure the system variable `tidb_executor_concurrency = 1` to trigger disk spill for HashAgg.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

> **Note:**
>
> + Currently, it is not supported to use the aggregate functions with `DISTINCT` option to spill the disk for HashAgg. When you use the aggregate functions with `DISTINCT` option with too much memory, the disk spill fails.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
The following example creates a SQL statement occupying too much memory to display the disk spill feature for HashAgg:
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

1. Configure the memory quota of a SQL statement to 1GB (1 GB by default):
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "sql" >}}

```sql
set tidb_mem_quota_query = 1 << 30;
```

2. Create a single table `CREATE TABLE t(a int);` and insert 256 rows of different data.

3. Execute the following SQL statement:

{{< copyable "sql" >}}

```sql
[tidb]> explain analyze select /*+ HASH_AGG() */ count(*) from t t1 join t t2 join t t3 group by t1.a, t2.a, t3.a;
```

Because this SQL statement occupies too much memory, the following error message "out of memory quota" is returned:
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

```sql
ERROR 1105 (HY000): Out Of Memory Quota![conn_id=3]
```

4. Configure the system variable `tidb_executor_concurrency` to 1. With this configuration, when out of memory, HashAgg automatically tries to trigger disk spill.

{{< copyable "sql" >}}

```sql
set tidb_executor_concurrency = 1;
```

5. Execute the same SQL statement. You can find this time the statement can be successfully executed and no error message is returned. From the following detailed execution plan, you can see that HashAgg used 600MB of hard disk space.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "sql" >}}

```sql
[tidb]> explain analyze select /*+ HASH_AGG() */ count(*) from t t1 join t t2 join t t3 group by t1.a, t2.a, t3.a;
```

```sql
+---------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+-----------+----------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+---------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+-----------+----------+
| HashAgg_11 | 204.80 | 16777216 | root | | time:1m37.4s, loops:16385 | group by:test.t.a, test.t.a, test.t.a, funcs:count(1)->Column#7 | 1.13 GB | 600.0 MB |
| └─HashJoin_12 | 16777216.00 | 16777216 | root | | time:21.5s, loops:16385, build_hash_table:{total:267.2µs, fetch:228.9µs, build:38.2µs}, probe:{concurrency:1, total:35s, max:35s, probe:35s, fetch:962.2µs} | CARTESIAN inner join | 8.23 KB | 4 KB |
| ├─TableReader_21(Build) | 256.00 | 256 | root | | time:87.2µs, loops:2, cop_task: {num: 1, max: 150µs, proc_keys: 0, rpc_num: 1, rpc_time: 145.1µs, copr_cache_hit_ratio: 0.00} | data:TableFullScan_20 | 885 Bytes | N/A |
| │ └─TableFullScan_20 | 256.00 | 256 | cop[tikv] | table:t3 | tikv_task:{time:23.2µs, loops:256} | keep order:false, stats:pseudo | N/A | N/A |
| └─HashJoin_14(Probe) | 65536.00 | 65536 | root | | time:728.1µs, loops:65, build_hash_table:{total:307.5µs, fetch:277.6µs, build:29.9µs}, probe:{concurrency:1, total:34.3s, max:34.3s, probe:34.3s, fetch:278µs} | CARTESIAN inner join | 8.23 KB | 4 KB |
| ├─TableReader_19(Build) | 256.00 | 256 | root | | time:126.2µs, loops:2, cop_task: {num: 1, max: 308.4µs, proc_keys: 0, rpc_num: 1, rpc_time: 295.3µs, copr_cache_hit_ratio: 0.00} | data:TableFullScan_18 | 885 Bytes | N/A |
| │ └─TableFullScan_18 | 256.00 | 256 | cop[tikv] | table:t2 | tikv_task:{time:79.2µs, loops:256} | keep order:false, stats:pseudo | N/A | N/A |
| └─TableReader_17(Probe) | 256.00 | 256 | root | | time:211.1µs, loops:2, cop_task: {num: 1, max: 295.5µs, proc_keys: 0, rpc_num: 1, rpc_time: 279.7µs, copr_cache_hit_ratio: 0.00} | data:TableFullScan_16 | 885 Bytes | N/A |
| └─TableFullScan_16 | 256.00 | 256 | cop[tikv] | table:t1 | tikv_task:{time:71.4µs, loops:256} | keep order:false, stats:pseudo | N/A | N/A |
+---------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+-----------+----------+
9 rows in set (1 min 37.428 sec)
```