Skip to content

Commit

Permalink
add Clinic docs (#7704)
Browse files Browse the repository at this point in the history
  • Loading branch information
en-jin19 authored Mar 3, 2022
1 parent cab7653 commit cd8961e
Show file tree
Hide file tree
Showing 4 changed files with 553 additions and 0 deletions.
4 changes: 4 additions & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@
- [tiup-cluster](/tiup/tiup-cluster.md)
- [tiup-mirror](/tiup/tiup-mirror.md)
- [tiup-bench](/tiup/tiup-bench.md)
- TiDB Clinic Diagnostic Service (Beta)
- [Overview](/clinic/clinic-introduction.md)
- [Use TiDB Clinic](/clinic/clinic-user-guide-for-tiup.md)
- [TiDB Clinic Diagnostic Data](/clinic/clinic-data-instruction-for-tiup.md)
- [TiDB Operator](/tidb-operator-overview.md)
- Backup & Restore (BR)
- [BR Tool Overview](/br/backup-and-restore-tool.md)
Expand Down
140 changes: 140 additions & 0 deletions clinic/clinic-data-instruction-for-tiup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: TiDB Clinic Diagnostic Data
summary: Learn what diagnostic data can be collected by TiDB Clinic Diagnostic Service from the TiDB and DM clusters deployed using TiUP.
---

# TiDB Clinic Diagnostic Data

This document provides the types of diagnostic data that can be collected by TiDB Clinic Diagnostic Service (TiDB Clinic) from the TiDB and DM clusters deployed using TiUP. Also, the document lists the parameters for data collection corresponding to each data type. When running a command to [collect data using the Clinic Diag tool (Diag)](/clinic/clinic-user-guide-for-tiup.md), you can add the required parameters to the command according to the types of the data to be collected.

The diagnostic data collected by TiDB Clinic is **only** used for troubleshooting cluster problems.

Set up on the PingCAP intranet (in China), the Clinic Server is a cloud service deployed in the cloud. If you upload the collected diagnostic data to the Clinic Server for PingCAP technical support staff to troubleshoot cluster problems remotely, the uploaded data is stored in the AWS S3 China (Beijing) Region server set up by PingCAP. PingCAP strictly controls permissions for data access and only allows authorized in-house technical support staff to access the uploaded data.

After a technical support case is closed, PingCAP permanently deletes or anonymizes the corresponding data within 90 days.

## TiDB clusters

This section lists the types of diagnostic data that can be collected by Diag from the TiDB clusters deployed using TiUP.

### Basic information of the cluster

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Basic information of the cluster, including the cluster ID | `cluster.json` | The data is collected per run by default. |
| Detailed information of the cluster | `meta.yaml` | The data is collected per run by default. |

### TiDB diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `tidb.log` | `--include=log` |
| Error log | `tidb_stderr.log` | `--include=log` |
| Slow log | `tidb_slow_query.log` | `--include=log` |
| Configuration file | `tidb.toml` | `--include=config` |
| Real-time configuration | `config.json` | `--include=config` |

### TiKV diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `tikv.log` | `--include=log` |
| Error log | `tikv_stderr.log` | `--include=log` |
| Configuration file | `tikv.toml` | `--include=config` |
| Real-time configuration | `config.json` | `--include=config` |

### PD diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `pd.log` | `--include=log` |
| Error log | `pd_stderr.log` | `--include=log` |
| Configuration file | `pd.toml` | `--include=config` |
| Real-time configuration | `config.json` | `--include=config` |
| Outputs of the command `tiup ctl pd -u http://${pd IP}:${PORT} store` | `store.json` | `--include=config` |
| Outputs of the command `tiup ctl pd -u http://${pd IP}:${PORT} config placement-rules show` | `placement-rule.json` | `--include=config` |

### TiFlash diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `tiflash.log` | `--include=log` |
| Error log | `tiflash_stderr.log` | `--include=log` |
| Configuration file | `tiflash-learner.toml``tiflash-preprocessed.toml``tiflash.toml` | `--include=config` |
| Real-time configuration | `config.json` | `--include=config` |

### TiCDC diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `ticdc.log` | `--include=log`|
| Error log | `ticdc_stderr.log` | `--include=log` |
| Configuration file | `ticdc.toml` | `--include=config` |

### Prometheus monitoring data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| All metrics data | `{metric_name}.json` | `--include=monitor` |
| All alerts data | `alerts.json` | `--include=monitor` |

### TiDB system variables

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| TiDB system variables (Diag does not collect this data type by default; if you need to collect this data type, database credential is required) | `mysql.tidb.csv` | `--include=db_vars` |
| | `global_variables.csv` | `--include=db_vars` |

### System information of the cluster

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Kernel log | `dmesg.log` | `--include=system` |
| Basic information of the system and hardware | `insight.json` | `--include=system` |
| Contents in the `/etc/security/limits.conf` | `limits.conf` | `--include=system` |
| List of kernel parameters | `sysctl.conf` | `--include=system` |
| Socket system information, which is the output of the `ss` command | `ss.txt` | `--include=system` |

## DM clusters

This section lists the types of diagnostic data that can be collected by Diag from the DM clusters deployed using TiUP.

### Basic information of the cluster

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Basic information of the cluster, including the cluster ID | `cluster.json`| The data is collected per run by default. |
| Detailed information of the cluster | `meta.yaml` | The data is collected per run by default. |

### dm-master diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log | `m-master.log` | `--include=log` |
| Error log | `dm-master_stderr.log` | `--include=log` |
| Configuration file | `dm-master.toml` | `--include=config` |

### dm-worker diagnostic data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Log| `dm-worker.log` | `--include=log`|
| Error log | `dm-worker_stderr.log` | `--include=log` |
| Configuration file | `dm-work.toml` | `--include=config` |

### Prometheus monitoring data

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| All metrics data | `{metric_name}.json` | `--include=monitor` |
| All alerts data | `alerts.json` | `--include=monitor` |

### System information of the cluster

| Data type | Exported file | Parameter for data collection by TiDB Clinic |
| :------ | :------ |:-------- |
| Kernel log | `dmesg.log` | `--include=system` |
| Basic information of the system and hardware | `insight.json` | `--include=system` |
| Contents in the `/etc/security/limits.conf` system | `limits.conf` | `--include=system` |
| List of kernel parameters | `sysctl.conf` | `--include=system` |
| Socket system information, which is the output of the `ss` command | `ss.txt` | `--include=system` |
61 changes: 61 additions & 0 deletions clinic/clinic-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Overview of TiDB Clinic
summary: Learn about the TiDB Clinic Diagnostic Service (TiDB Clinic), including tool components, user scenarios, and implementation principles.
---

## Overview of TiDB Clinic

TiDB Clinic Diagnostic Service (TiDB Clinic) is a diagnostic service provided by PingCAP for TiDB clusters that are deployed using either TiUP or TiDB Operator. This service helps to troubleshoot cluster problems remotely and provides a quick check of cluster status locally. With TiDB Clinic, you can ensure the stable operation of your TiDB cluster for its full life-cycle, predict potential problems, reduce the probability of problems, troubleshoot cluster problems quickly, and fix cluster problems.

TiDB Clinic is currently in the Beta testing stage for invited users only. This service provides the following two components to diagnose cluster problems:

- Diag: a diagnostic tool deployed on the cluster side. Diag is used to collect cluster diagnostic data, upload diagnostic data to the Clinic Server, and perform a quick health check locally on your cluster. For a full list of diagnostic data that can be collected by Diag, see [TiDB Clinic Diagnostic Data](/clinic/clinic-data-instruction-for-tiup.md).

> **Note:**
>
> - Diag temporarily **does not support** collecting data from the clusters deployed using TiDB Ansible.
> - For the TiDB Clinic Beta version, if you want to upload data to the Clinic Server for remote troubleshooting using Diag, you need to contact [PingCAP technical support](https://en.pingcap.com/contact-us/) to get a trial account first.
- Clinic Server: a cloud service deployed in the cloud. By providing diagnostic services in the SaaS model, the Clinic Server can not only receive uploaded diagnostic data but also work as an online diagnostic environment to store data, view data, and provide cluster diagnostic reports.

> **Note:**
>
> For the TiDB Clinic Beta version, the features of the Clinic Server are **not** open for external users. After you upload collected data to the Clinic Server and get a data link using Diag, only authorized PingCAP technical support staff can access the link and view the data.
## User scenarios

- Troubleshoot cluster problems remotely

When your cluster has some problems that cannot be fixed quickly, you can ask for help at [TiDB Community slack channel](https://tidbcommunity.slack.com/archives/CH7TTLL7P) or contact PingCAP technical support. When contacting technical support for remote assistance, you need to save various diagnostic data from the cluster and forward the data to the support staff. In this case, you can use Diag to collect diagnostic data with one click. Diag helps you to collect complete diagnostic data quickly, which can avoid complex manual data collection operations. After collecting data, you can upload the data to the Clinic Server for PingCAP technical support staff to troubleshoot cluster problems. The Clinic Server provides secure storage for uploaded diagnostic data and supports the online diagnosis, which greatly improves the troubleshooting efficiency.

- Perform a quick check on the cluster status locally

Even if your cluster runs stably now, it is necessary to periodically check the cluster to avoid potential stability risks. You can check the potential health risks of a cluster using the local quick check feature provided by TiDB Clinic. The TiDB Clinic Beta version provides a rationality check on cluster configuration items to discover unreasonable configurations and provide modification suggestions.

## Implementation principles

This section introduces the implementation principles about how Diag (a cluster-side tool provided by TiDB Clinic) collects diagnostic data from a cluster.

First, Diag gets cluster topology information from the deployment tool TiUP (tiup-cluster) or TiDB Operator (tidb-operator). Then, Diag collects different types of diagnostic data through various data collection methods as follows:

- Transfer server files through SCP

For the clusters deployed using TiUP, Diag can collect log files and configuration files directly from the nodes of the target component through the Secure copy protocol (SCP).

- Collect data by running commands remotely through SSH

For the clusters deployed using TiUP, Diag can connect to the target component system through SSH (Secure Shell) and run commands (such as Insight) to obtain system information, including kernel logs, kernel parameters, and basic information of the system and hardware.

- Collect data through HTTP call

- By calling the HTTP interface of TiDB components, Diag can get the real-time configuration sampling information and the real-time performance sampling information of TiDB, TiKV, PD, and other components.
- By calling the HTTP interface of Prometheus, Diag can get alert information and monitoring metrics data.

- Query database parameters through SQL statements

Using SQL statements, Diag can query system variables and other information of TiDB. To use this method, you need to **additionally provide** the username and password to access TiDB when collecting data.

## Next step

- [Use TiDB Clinic](/clinic/clinic-user-guide-for-tiup.md)
- [TiDB Clinic Diagnostic Data](/clinic/clinic-data-instruction-for-tiup.md)
Loading

0 comments on commit cd8961e

Please sign in to comment.