stmtsummary: make statements summary persistent

# Abstract

This proposal describes how to persist the existing SQL statement statistics to the local disk so that the historical data will persist after TiDB restarted.

# Background

Statistics of SQL statements are cached in TiDB memory. The `STATEMENTS_SUMMARY` series tables provide the query interface of the node's own SQL statements; the `CLUSTER_STATEMENTS_SUMMARY` series tables provide the query interface of the cluster's SQL statements. When querying the cluster table, TiDB will "push down" the coprocessor request to all other TiDB nodes, summarize the results of the entire cluster and return them:

![Figure: SQL Statements Architecture](https://user-images.githubusercontent.com/40526986/215303310-6435be18-4ae8-4d3b-b029-e0bee160328e.png)

There are currently some problems with this architecture:

1. SQL statements are only maintained in memory. When TiDB is restarted for any reason, the data will be permanently lost.
2. When there are many SQL statements, querying them through the memory table will bring great pressure to TiDB.

In particular, when we need to query SQL statements, it is often because the pressure on the TiDB cluster is already high, and querying SQL statements itself will bring greater pressure to the TiDB cluster. Worst of all, when this vicious cycle triggers Out Of Memory, SQL statements are permanently lost.

This proposal solves the first problem, we will persist SQL statements to local disk, and transfer queries from memory tables to disk instead of memory. The second problem will be decoupled from the TiDB kernel and solved as a proposal of TiDB Dashboard.

# Design Details

Currently SQL statements are divided into three types:

| TABLE | DESCRIPTION |
| ------------- | ------------- |
| `STATEMENTS_SUMMARY` `CLUSTER_STATEMENTS_SUMMARY`  | Statistics that are being continuously updated in the current time window. |
| `STATEMENTS_SUMMARY_HISTORY` `CLUSTER_STATEMENTS_SUMMARY_HISTORY` | Statistics of historical time windows that have been frozen and will not change. |
| `STATEMENTS_SUMMARY_EVICTED` `CLUSTER_STATEMENTS_SUMMARY_EVICTED` | When the number of SQL exceeds the "max statements", the excess will be merged into the evicted table. |

The following figure describes the relationship between these three types of data:

![Figure: SQL Statements Data Model](https://user-images.githubusercontent.com/40526986/215303464-5b19b15a-61c3-4ca3-a8c9-ea6e3d74effd.png)

> The bold Window Size, Window Count and Max Statements are configurations that can be modified by users.

For historical statements, whenever the current window is frozen as a historical window (that is, when a new current window is generated), we directly write the frozen window to the local disk. The file is saved in the same directory as `tidb.log` and `tidb-slow.log` and is named `tidb-statements.log`. We also handle `tidb-statements.log` the same way we handle log files, allowing rotation and cleanup.

SQL statements are aggregatable data. It can be expected that the characteristics of SQL generated by most users are: "limited number of templates, large number of repeated executions". It is not feasible to process SQL statements as streaming logs in a log-like or slow-log-like manner. So, although the file extension is .log, the data format it saves will not be the standard TiDB log format, but multi-line JSON. This is because SQL statements are not typical streaming log data, they are just divided into fixed-window data blocks:

```
{"digest": "...", "begin": ..., "end": ..., "total_latency": 10, "exec_count": 10, ...}
{"digest": "...", "begin": ..., "end": ..., "total_latency": 20, "exec_count": 20, ...}
{"digest": "...", "begin": ..., "end": ..., "total_latency": 30, "exec_count": 30, ...}
...
```

Also, we need to modify the executor, instead of retrieving data from memory, the executor should retrieve data directly from disk files (similar to `SLOW_QUERY` table).

# Compatibility

There are no compatibility changes to `INFORMATION_SCHEMA`.

There is a behavior change in bindable statements. In the past we scanned all statements (including history) in memory for bindable statements, now we can only scan statements in the current time window (the others have been written to disk.).

# Roadmap

- [ ] Add experimental feature to persist historical SQL Statements to disk.
- [ ] Update executors to read historical data from disk.
- [ ] (Optimization) Add a shutdown hook for TiDB to write data that has not been persisted in memory to disk.
- [ ] Feature GA, remove the original stmtsummary module, and enable the persistent version of the stmtsummary module by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stmtsummary: make statements summary persistent #40812

Abstract

Background

Design Details

Compatibility

Roadmap

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TABLE	DESCRIPTION
`STATEMENTS_SUMMARY` `CLUSTER_STATEMENTS_SUMMARY`	Statistics that are being continuously updated in the current time window.
`STATEMENTS_SUMMARY_HISTORY` `CLUSTER_STATEMENTS_SUMMARY_HISTORY`	Statistics of historical time windows that have been frozen and will not change.
`STATEMENTS_SUMMARY_EVICTED` `CLUSTER_STATEMENTS_SUMMARY_EVICTED`	When the number of SQL exceeds the "max statements", the excess will be merged into the evicted table.