Tracking issue for TiDB built-in SQL Diagnostics

The PR https://github.com/pingcap/tidb/pull/13481 proposes a new method of acquiring diagnostic information in TiDB and exposing diagnostic information by the system tables so that users can query using SQL. The purpose of the proposal #13481 is aim to improve the efficiency of the cluster-based information query, state acquisition, log retrieval, one-click inspection, and fault diagnosis 

## Note

This issue is a TODO list catalog that is used to clarify the split details of the entire feature. If you are interested in a part, use the following workflow:

1. Select the module you are interested in.
2. Create a new issue to claim the relevant work and describe the rough implementation in the new issue.
3. File a new pull request.

## Issues:

- Protocol Definition
  - [x] Define the `Diagnostics gRPC Service` and the related message type in [kvproto](https://github.com/pingcap/kvproto)  #13581 @lonng
- Information Collection
  - Cluster Topology
    - Add a system table to provide cluster topology
      - [x] TiDB #13035 @lonng
    - [x] The current implementation should be refined and the `ID` and `NAME` columns should be deleted #13586 @lonng
  - Cluster Configuration
    - Add a system table to provide cluster configuration
      - [x] TiDB #13063 @lonng
      - [x] TiKV #13063 @lonng
      - [x] PD #13063 @lonng
      - [x] TiDB: Predicates push down #13832 @lonng 
  - Cluster Performance Sampling
    - Add HTTP API for cluster components to get performance sample data
      - [x] TiDB #12986 @lonng
      - [x] PD https://github.com/pingcap/pd/pull/1965 @lonng
      - [x] TiKV https://github.com/tikv/tikv/pull/5697 @YangKeao 
    - Add system table to provide query via SQL
      - [x] TiDB #12986 @lonng
      - [x] TiKV #13711 @lonng 
      - [x] PD #13717 @lonng 
  - Information Collection Framework:
    - Pluggable Information Collection Framework to support extended information collection rules
      - [x] TiDB @crazycs520 #13693 , https://github.com/pingcap/sysutil pkg.
      - [x] PD https://github.com/pingcap/pd/pull/2024 @lonng 
      - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
    - Information Collection Rules
      - Hardware Information
        - CPU information: number of physical cores, number of logical cores, NUMA information, CPU frequency, CPU vendor, L1/L2/L3 cache size
          - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
          - [x] TiDB/PD https://github.com/pingcap/tidb/pull/13997 @crazycs520 
        - NIC information: NIC device name, NIC enabled, manufacturer, model, bandwidth, driver version, interface queue number (optional)
          - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
          - [x] TiDB/PD https://github.com/pingcap/tidb/pull/13997 @crazycs520 
        - Disk information: disk name, disk capacity, disk usage, disk partition, mount information
          - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
          - [x] TiDB/PD https://github.com/pingcap/tidb/pull/13997 @crazycs520 
        - ~~USB device list~~
        - Memory information
          - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
          - [x] TiDB/PD https://github.com/pingcap/tidb/pull/13997 @crazycs520 
    - System Information
      - Kernel information: sysctl -a / ulimit -a
        - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
        - [x] TiDB/PD
      - Process information: current process name, command line parameters, executable file path, pid, environment variables, memory, startup time, uid, gid, process status
        - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
        - [ ] TiDB/PD
      - ~~File descriptor information: Available Quantity, Current Used Quantity~~
  - Load Information
    - CPU usage, 1/5/15 minute load
      - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
      - [x] TiDB/PD @crazycs520 #13693 
    - Memory: Total/Free/Available/Buffers/Cached/Active/Inactive/Swap
      - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
      - [x] TiDB/PD @crazycs520 #13693 
    - Disk IO:
      - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
        - tps: The number of transfers per second that were issued to the device.
        - rrqm/s: The number of read requests merged per second that were queued to the device.
        - wrqm/s: The number of write requests merged per second that were queued to the device.
        - r/s: The number (after merges) of read requests completed per second for the device.
        - w/s: The number (after merges) of write requests completed per second for the device.
        - rsec/s:  The number of sectors (kilobytes, megabytes) read from the device per second.
        - wsec/s: The number of sectors (kilobytes, megabytes) written to the device per second.
        - await: The average time (in milliseconds) for I/O requests issued to the device to be served. 
        - %util: Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device)
      - [x] TiDB/PD @crazycs520 #13693 
        - tps: The number of transfers per second that were issued to the device.
        - rrqm/s: The number of read requests merged per second that were queued to the device.
        - wrqm/s: The number of write requests merged per second that were queued to the device.
        - r/s: The number (after merges) of read requests completed per second for the device.
        - w/s: The number (after merges) of write requests completed per second for the device.
        - rsec/s:  The number of sectors (kilobytes, megabytes) read from the device per second.
        - wsec/s: The number of sectors (kilobytes, megabytes) written to the device per second.
        - await: The average time (in milliseconds) for I/O requests issued to the device to be served. 
        - %util: Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device)
    - Network IO
      - [x] TiKV https://github.com/tikv/tikv/pull/6135 @lonng
        - IFACE: name of the network interface for which statistics are reported.
        - rxpck/s: total number of packets received per second.
        - txpck/s: total number of packets transmitted per second.
        - rxkB/s: total number of kilobytes received per second.
        - txkB/s: total number of kilobytes transmitted per second.
        - rxcmp/s: number of compressed packets received per second.
        - txcmp/s: number of compressed packets transmitted per second.
        - rxmcst/s: number of multicast packets received per second.
      - [x] TiDB/PD  @crazycs520 #13693 
        - IFACE: name of the network interface for which statistics are reported.
        - rxpck/s: total number of packets received per second.
        - txpck/s: total number of packets transmitted per second.
        - rxkB/s: total number of kilobytes received per second.
        - txkB/s: total number of kilobytes transmitted per second.
        - rxcmp/s: number of compressed packets received per second.
        - txcmp/s: number of compressed packets transmitted per second.
        - rxmcst/s: number of multicast packets received per second.
  - System Info Tables
    - Hardward Info
      - [x] TiDB https://github.com/pingcap/tidb/pull/13997 @crazycs520 
    - Software Info
      - [x] TiDB https://github.com/pingcap/tidb/pull/13997 @crazycs520 
- Cluster Memory Table
  - [x] Memory table global view #13065 @crazycs520 
- Memory table refactor
  - [x] Use virtual table framework manage information schema  #13696 @lonng 
  - [x] Extra the `LogicalMemTable` part from `DataSource` #13741 @lonng 
  - [x] Predication pushdown framework for virtual table #13821 @lonng 
- Logging framework
  - Log predicate pushdown
    - [x] TiDB https://github.com/pingcap/tidb/pull/14018 @lonng
  - Log LogReader executor implementation
    - [x] TiDB https://github.com/pingcap/tidb/pull/14046  @lonng 
  - gRPC Service implementation
    - [x] TiKV https://github.com/tikv/tikv/pull/5980 @lonng 
    - [x] TiDB/PD https://github.com/pingcap/sysutil/pull/2/ @lonng
    - [x] PD https://github.com/pingcap/pd/pull/2024 @lonng 
  - Log system table
    - [x] TiDB https://github.com/pingcap/tidb/pull/14018 @lonng
  - add seqID in TiKV slow log, This will help indicate which SQL in a big transaction, because the transaction start_ts is same.
- Metrics information framework
  - Basic metrics information system table query framework
    - [x] TiDB https://github.com/pingcap/tidb/pull/13757 @crazycs520 
  - Add `remote-metrics-storage` configuration
    - [x] PD https://github.com/pingcap/pd/pull/1957 @crazycs520 
  - Implement the first version of the PromQL query interface based on Proxy
    - [x] PD https://github.com/pingcap/pd/pull/1957 @crazycs520 
  - Metrics predication pushdown
    - [x] TiDB https://github.com/pingcap/tidb/pull/14169 @crazycs520 
- Query expression mapping rules
  - Metric information table query framework. 
    - [x] TiDB query metric with promQL and present as table. https://github.com/pingcap/tidb/pull/13757  @crazycs520 
- Diagnostics Framework
  - [x] Inspection schema https://github.com/pingcap/tidb/pull/14147 @lonng
  - [x] Diagnostics framework executor https://github.com/pingcap/tidb/pull/14114 @lonng 
  - [x] Diagnostics common rules https://github.com/pingcap/tidb/pull/14114 @lonng 

**Teachability, Documentation, Adoption, Migration Strategy:**

Proposal: https://github.com/pingcap/tidb/pull/13481

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for TiDB built-in SQL Diagnostics #13567

Note

Issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking issue for TiDB built-in SQL Diagnostics #13567

Description

Note

Issues:

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions