Skip to content

Commit

Permalink
[refactor](stats) Persist status of analyze task to FE meta data (apa…
Browse files Browse the repository at this point in the history
…che#20264)

1. In the past, we use a BE table named `analysis_jobs` to persist the status of analyze jobs/tasks, however there are many flaws such as, if BE crashed analyze job/task would failed however the status of analyze job/task couldn't get updated.
2. Support `DROP ANALYZE JOB [job_id]` to delete analyze job
3. Support `SHOW ANALYZE TASK STATUS [job_id] ` to  get the task status of specific job
4. Restrict the execute condition of auto analyze, only when  the  last execution of auto analyze job finished a while ago could be executed again
5. Support analyze whole DB
  • Loading branch information
Kikyou1997 authored Jun 2, 2023
1 parent 62c188d commit e32eba8
Show file tree
Hide file tree
Showing 47 changed files with 1,504 additions and 1,134 deletions.
74 changes: 21 additions & 53 deletions docs/en/docs/query-acceleration/statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,8 @@ The user triggers a manual collection job through a statement `ANALYZE` to colle
Column statistics collection syntax:

```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ] [ [ WITH SYNC ] [ WITH INCREMENTAL ] [ WITH SAMPLE PERCENT | ROWS ] [ WITH PERIOD ] ] [ PROPERTIES ("key" = "value", ...) ];
```

Column histogram collection syntax:

```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC] [ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] [ WITH PERIOD ] ] [ PROPERTIES ("key" = "value", ...) ];
ANALYZE TABLE | DATABASE table_name | db_name
[ (column_name [, ...]) ] [ [ WITH SYNC ] [ WITH INCREMENTAL ] [ WITH SAMPLE PERCENT | ROWS ] [ WITH PERIOD ] [WITH HISTOGRAM]] [ PROPERTIES ("key" = "value", ...) ];
```

Explanation:
Expand Down Expand Up @@ -422,7 +415,7 @@ The syntax is as follows:

```SQL
SHOW ANALYZE [ table_name | job_id ]
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ] [ ORDER BY ... ] [ LIMIT OFFSET ];
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
```

Explanation:
Expand Down Expand Up @@ -450,54 +443,29 @@ Currently `SHOW ANALYZE`, 11 columns are output, as follows:
Example:

- View statistics job information with ID `68603`, using the following syntax:
- View statistics job information with ID `20038`, using the following syntax:

```SQL
mysql> SHOW ANALYZE 68603;
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | schedule_type |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| 68603 | internal | default_cluster:stats_test | example_tbl | | MANUAL | INDEX | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | last_visit_date | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | age | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | sex | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | date | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | user_id | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | max_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | cost | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | min_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:24 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | city | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
mysql> SHOW ANALYZE 20038
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | schedule_type |
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
| 20038 | internal | default_cluster:test | t3 | [col4,col2,col3,col1] | MANUAL | FUNDAMENTALS | | 2023-06-01 17:22:15 | FINISHED | ONCE |
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+

```

- To view `example_tbl` statistics job information for a table, use the following syntax:
```
mysql> show analyze task status 20038 ;
+---------+----------+---------+----------------------+----------+
| task_id | col_name | message | last_exec_time_in_ms | state |
+---------+----------+---------+----------------------+----------+
| 20039 | col4 | | 2023-06-01 17:22:15 | FINISHED |
| 20040 | col2 | | 2023-06-01 17:22:15 | FINISHED |
| 20041 | col3 | | 2023-06-01 17:22:15 | FINISHED |
| 20042 | col1 | | 2023-06-01 17:22:15 | FINISHED |
+---------+----------+---------+----------------------+----------+
```SQL
mysql> SHOW ANALYZE stats_test.example_tbl;
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | schedule_type |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| 68603 | internal | default_cluster:stats_test | example_tbl | | MANUAL | INDEX | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | last_visit_date | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | age | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | city | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | cost | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | min_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:24 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | date | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | user_id | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | max_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | sex | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | user_id | MANUAL | HISTOGRAM | | 2023-05-05 18:00:11 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | sex | MANUAL | HISTOGRAM | | 2023-05-05 18:00:09 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | last_visit_date | MANUAL | HISTOGRAM | | 2023-05-05 18:00:10 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | date | MANUAL | HISTOGRAM | | 2023-05-05 18:00:10 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | cost | MANUAL | HISTOGRAM | | 2023-05-05 18:00:10 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | age | MANUAL | HISTOGRAM | | 2023-05-05 18:00:10 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | min_dwell_time | MANUAL | HISTOGRAM | | 2023-05-05 18:00:10 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | max_dwell_time | MANUAL | HISTOGRAM | | 2023-05-05 18:00:09 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | | MANUAL | HISTOGRAM | | 2023-05-05 18:00:11 | FINISHED | ONCE |
| 68678 | internal | default_cluster:stats_test | example_tbl | city | MANUAL | HISTOGRAM | | 2023-05-05 18:00:11 | FINISHED | ONCE |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
```

- View all statistics job information, and return the first 3 pieces of information in descending order of the last completion time, using the following syntax:
Expand Down
58 changes: 27 additions & 31 deletions docs/zh-CN/docs/query-acceleration/statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,19 +79,9 @@ Doris 查询优化器使用统计信息来确定查询最有效的执行计划
列统计信息收集语法:

```SQL
ANALYZE [ SYNC ] TABLE table_name
ANALYZE TABLE | DATABASE table_name | db_name
[ (column_name [, ...]) ]
[ [ WITH SYNC ] [ WITH INCREMENTAL ] [ WITH SAMPLE PERCENT | ROWS ] [ WITH PERIOD ] ]
[ PROPERTIES ("key" = "value", ...) ];
```

列直方图收集语法:

```SQL
ANALYZE [ SYNC ] TABLE table_name
[ (column_name [, ...]) ]
UPDATE HISTOGRAM
[ [ WITH SYNC] [ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] [ WITH PERIOD ] ]
[ [ WITH SYNC ] [ WITH INCREMENTAL ] [ WITH SAMPLE PERCENT | ROWS ] [ WITH PERIOD ] [WITH HISTOGRAM]]
[ PROPERTIES ("key" = "value", ...) ];
```

Expand Down Expand Up @@ -456,9 +446,7 @@ mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH PERIOD 86400;

```SQL
SHOW ANALYZE [ table_name | job_id ]
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ]
[ ORDER BY ... ]
[ LIMIT OFFSET ];
[ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
```

其中:
Expand Down Expand Up @@ -486,24 +474,32 @@ SHOW ANALYZE [ table_name | job_id ]
示例:

- 查看 ID 为 `68603` 的统计任务信息,使用以下语法:
- 查看 ID 为 `20038` 的统计任务信息,使用以下语法:

```SQL
mysql> SHOW ANALYZE 68603;
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | schedule_type |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
| 68603 | internal | default_cluster:stats_test | example_tbl | | MANUAL | INDEX | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | last_visit_date | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | age | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | sex | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | date | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | user_id | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | max_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:26 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | cost | MANUAL | COLUMN | | 2023-05-05 17:53:27 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | min_dwell_time | MANUAL | COLUMN | | 2023-05-05 17:53:24 | FINISHED | ONCE |
| 68603 | internal | default_cluster:stats_test | example_tbl | city | MANUAL | COLUMN | | 2023-05-05 17:53:25 | FINISHED | ONCE |
+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
mysql> SHOW ANALYZE 20038
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
| job_id | catalog_name | db_name | tbl_name | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state | schedule_type |
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
| 20038 | internal | default_cluster:test | t3 | [col4,col2,col3,col1] | MANUAL | FUNDAMENTALS | | 2023-06-01 17:22:15 | FINISHED | ONCE |
+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+

```

可通过`SHOW ANALYZE TASK STATUS [job_id]`,查看具体每个列统计信息的收集完成情况。

```
mysql> show analyze task status 20038 ;
+---------+----------+---------+----------------------+----------+
| task_id | col_name | message | last_exec_time_in_ms | state |
+---------+----------+---------+----------------------+----------+
| 20039 | col4 | | 2023-06-01 17:22:15 | FINISHED |
| 20040 | col2 | | 2023-06-01 17:22:15 | FINISHED |
| 20041 | col3 | | 2023-06-01 17:22:15 | FINISHED |
| 20042 | col1 | | 2023-06-01 17:22:15 | FINISHED |
+---------+----------+---------+----------------------+----------+
```

- 查看 `example_tbl` 表的的统计任务信息,使用以下语法:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1445,7 +1445,7 @@ public class Config extends ConfigBase {
* the system automatically checks the time interval for statistics
*/
@ConfField(mutable = true, masterOnly = true)
public static int auto_check_statistics_in_sec = 300;
public static int auto_check_statistics_in_minutes = 5;

/**
* If this configuration is enabled, you should also specify the trace_export_url.
Expand Down
Loading

0 comments on commit e32eba8

Please sign in to comment.