Easier discover performance issues and diagnose causes (Stage 1) #18867
Open
Description
Simplify the process of discovering performance issues, as well as diagnosing their root causes.
As the first stage, we focus on these specific root causes:
- hot spot
- disk jitter
- GC or tombstone data
- unnecesaary large data scan (there was a better plan)
- transaction conflict
- small resource capacity (thread pool full)
We will do the following task items (the list is growing and the task item may not be TiDB only):
- Visualize regions and replicas, to reveal some schedule related unbalance.
- Reveal statement directly in KeyViz so that when user sees hotspots they can go further to know which statement caused such hotspot.
- Introduce Timeline tracing to TiDB and TiKV, to track the execution information of each step of a specific statement (instead of all statements via metrics).
- Report visible versions and all versions, as long as RocksDB PerfContext from TiKV to TiDB in Fix incorrect processed / total keys counter tikv/tikv#7563 to discover tomestone and block cache related slowness.
- Turn on PerfContext for some IO related metrics after Feature request: Add a bitmask to enable individual perf context counter facebook/rocksdb#7073 is implemented to know IO cost.
- Provide more details about the executor information by improving runtime information collection.
- Improve runtime information for transactions (especially conflicts).
- Implement diagnostics rules to automatically discover some kind of hotspot issues.
- Implement diagnostics rules to automatically discover some SQL statements' plan was de-optimized.
- Make it more easy to perform and understand diagnostics, including:
- A visualization of relationships of diagnostics report fields
- Refine TiDB Dashboard diagnostics generating page, separating diagnostics result and cluster report
Some previous attempts under this scope:
Category
- Feature
Value
Value description
(TBD)
Value score
- 5
Workload estimation
- 360 person-day
Time
Time
GanttStart: 2020-08-01
GanttDue: 2020-11-30
GanttProgress: 80%
Activity