Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

crazycs520 · 2020-06-10T12:30:29Z

Introduce

If the SQL plan was changed, and the execution is slowed down notably, It may be caused by the statistics was not accurate or something else( such as a bug of TiDB 😢 ).

Task1

First, we can find the SQL of the plan changed and execution is slowed down notably, Then append a warning in the INSPECTION_RESULT table.

How to find the SQL of the plan changed?

Find from cluster_statements_summary and cluster_statements_summary_history.

select count(distinct plan_digest) as count,digest,min(QUERY_SAMPLE_TEXT) 
from information_schema.cluster_statements_summary 
where SUMMARY_BEGIN_TIME <= '2020-06-10 20:26:00' 
    and SUMMARY_END_TIME >= '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3

select count(distinct plan_digest) as count,digest,min(QUERY_SAMPLE_TEXT) 
from information_schema.cluster_statements_summary_history 
where SUMMARY_BEGIN_TIME <= '2020-06-10 20:26:00' 
    and SUMMARY_END_TIME >= '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3

Find from cluster_slow_query:

select count(distinct plan_digest) as count,digest,min(query) 
from information_schema.cluster_slow_query 
where time >= '2020-06-10 20:26:00' and time < '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3

Should query different table according to the diagnosis time range:

cluster_statements_summary and cluster_statements_summary_history only contain the recent data, so if the diagnosis time range is in the cluster_statements_summary or cluster_statements_summary_history, we can only query the table, otherwise, should query from the cluster_slow_query.

Attention

Need specify the time range according to the INSPECTION_RESULT diagnose time range.

After find out the SQL of the plan changed, we also need to make sure the execution is slowed down notably, such as t1/t2 >= 2.

If the result was too much, I think we should also need to add a limit to this.

Task2

After task1 find out the SQL of the plan changed, maybe we can give more advice such as:

If this was caused by the statistics not accurate, we can advise running analyze table ... timely or use Plan binding

Refer

doc of Statement Summary Tables
Docs of CLUSTER_SLOW_QUERY table
You can refer the configInspection implementation in TiDB.

If you have any questions or want to discuss something, you can chat in the sig-dashboard slack channel and @ chenshuang.

The text was updated successfully, but these errors were encountered:

crazycs520 added type/enhancement The issue or PR belongs to an enhancement. difficulty/easy help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jun 10, 2020

crazycs520 mentioned this issue Jun 10, 2020

Add more SQL diagnosis rule. #17927

Open

3 tasks

tisonkun added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. and removed difficulty/easy labels Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

crazycs520 commented Jun 10, 2020 •

edited

Loading

Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

Comments

crazycs520 commented Jun 10, 2020 • edited Loading

Introduce

Task1

How to find the SQL of the plan changed?

Task2

Refer

crazycs520 commented Jun 10, 2020 •

edited

Loading