Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL diagnosis rule: SQL plan changed and execution is slowed down notably #17928

Open
crazycs520 opened this issue Jun 10, 2020 · 0 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.

Comments

@crazycs520
Copy link
Contributor

crazycs520 commented Jun 10, 2020

Introduce

If the SQL plan was changed, and the execution is slowed down notably, It may be caused by the statistics was not accurate or something else( such as a bug of TiDB 😢 ).

Task1

First, we can find the SQL of the plan changed and execution is slowed down notably, Then append a warning in the INSPECTION_RESULT table.

How to find the SQL of the plan changed?

  1. Find from cluster_statements_summary and cluster_statements_summary_history.
select count(distinct plan_digest) as count,digest,min(QUERY_SAMPLE_TEXT) 
from information_schema.cluster_statements_summary 
where SUMMARY_BEGIN_TIME <= '2020-06-10 20:26:00' 
    and SUMMARY_END_TIME >= '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3
select count(distinct plan_digest) as count,digest,min(QUERY_SAMPLE_TEXT) 
from information_schema.cluster_statements_summary_history 
where SUMMARY_BEGIN_TIME <= '2020-06-10 20:26:00' 
    and SUMMARY_END_TIME >= '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3
  1. Find from cluster_slow_query:
select count(distinct plan_digest) as count,digest,min(query) 
from information_schema.cluster_slow_query 
where time >= '2020-06-10 20:26:00' and time < '2020-06-10 20:26:06' 
group by digest having count > 1 limit 3

Should query different table according to the diagnosis time range:

  • cluster_statements_summary and cluster_statements_summary_history only contain the recent data, so if the diagnosis time range is in the cluster_statements_summary or cluster_statements_summary_history, we can only query the table, otherwise, should query from the cluster_slow_query.

Attention

Need specify the time range according to the INSPECTION_RESULT diagnose time range.

After find out the SQL of the plan changed, we also need to make sure the execution is slowed down notably, such as t1/t2 >= 2.

If the result was too much, I think we should also need to add a limit to this.

Task2

After task1 find out the SQL of the plan changed, maybe we can give more advice such as:

  • If this was caused by the statistics not accurate, we can advise running analyze table ... timely or use Plan binding

Refer

If you have any questions or want to discuss something, you can chat in the sig-dashboard slack channel and @ chenshuang.

@crazycs520 crazycs520 added type/enhancement The issue or PR belongs to an enhancement. difficulty/easy help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jun 10, 2020
@tisonkun tisonkun added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. and removed difficulty/easy labels Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants