Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TiKV store status information #6949

Merged
merged 12 commits into from
Mar 31, 2022
4 changes: 4 additions & 0 deletions faq/deploy-and-maintain-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,10 @@ PD can tolerate any synchronization error, but a larger error value means a larg

The client connection can only access the cluster through TiDB. TiDB connects PD and TiKV. PD and TiKV are transparent to the client. When TiDB connects to any PD, the PD tells TiDB who is the current leader. If this PD is not the leader, TiDB reconnects to the leader PD.

#### What is the relationship between each status (Up, Disconnect, Offline, Down, Tombstone) of a TiKV store?

You can use PD Control to check the status information of a TiKV store. For the relationship between each status, refer to [Relationship between each status of a TiKV store](/tidb-scheduling.md#information-collection).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

标题问的是各状态之间的关系,好像不需要解释怎么查看状态,另外,建议直接给出关系的描述,链接跳转来跳转去体验不会很好。附上下面的图是不是就可以了?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个图表示的各状态之间的关系其实不是特别清晰的,适合用作为辅助材料。

另外,为了直接回答标题问题,感觉可以把两句的顺序调换过来。毕竟如果解释了关系,可能也有人会好奇那要看这些关系需要怎么操作。

en-jin19 marked this conversation as resolved.
Show resolved Hide resolved

#### What is the difference between the `leader-schedule-limit` and `region-schedule-limit` scheduling parameters in PD?

- The `leader-schedule-limit` scheduling parameter is used to balance the Leader number of different TiKV servers, affecting the load of query processing.
Expand Down
Binary file added media/tikv-store-status-relationship.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion pd-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,8 @@ Usage:

> **Note:**
>
> When you use the `store limit` command, the original `region-add` and `region-remove` are deprecated. Use `add-peer` and `remove-peer` instead.
> - When you use the `store limit` command, the original `region-add` and `region-remove` are deprecated. Use `add-peer` and `remove-peer` instead.
en-jin19 marked this conversation as resolved.
Show resolved Hide resolved
> - To check the status information (Up, Disconnect, Offline, Down, or Tombstone) of a TiKV store, use PD Control. For the relationship between each status, refer to [Relationship between each status of a TiKV store](/tidb-scheduling.md#information-collection).
en-jin19 marked this conversation as resolved.
Show resolved Hide resolved

### `log [fatal | error | warn | info | debug]`

Expand Down
10 changes: 10 additions & 0 deletions tidb-scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ Scheduling is based on information collection. In short, the PD scheduling compo
* Whether the store is overloaded
* Labels (See [Perception of Topology](/schedule-replicas-by-topology-labels.md))

You can use PD control to check the status information of a TiKV store, which is divided into Up, Disconnect, Offline, Down, and Tombstone. The relationship between each status is as follows:
en-jin19 marked this conversation as resolved.
Show resolved Hide resolved

+ **Up**: The current TiKV store is providing service.
en-jin19 marked this conversation as resolved.
Show resolved Hide resolved
+ **Disconnect**: When the heartbeat information of PD and the TiKV store is lost for more than 20 seconds, the store status changes to "Disconnect". When the lost time is longer than the time specified by `max-store-down-time`, the store status changes to "Down".
+ **Down**: When the time that the TiKV store lost connection with the cluster is longer than the time specified by `max-store-down-time` (30 minutes by default), the store changes to "Down". In this status, the store starts replenishing peers of each Region on the surviving store.
+ **Offline**: If you manually take a TiKV store offline through PD Control, the store status changes to "Offline". This is only an intermediate status that the store is taking offline. The store in this status performs the operations of leader transfer and Region balance. When the `leader_count/region_count` (obtained through PD Control) shows that both operations of leader transfer and Region balance have been completed, the store status changes to "Tombstone" from "Offline". In the "Offline" status, you should **not** disable the store service and the physical server where the store is located.
+ **Tombstone**: This status indicates that the TiKV store is completely offline. You can use `remove-tombstone` interface to safely clean up TiKV in this status.
en-jin19 marked this conversation as resolved.
Show resolved Hide resolved

![TiKV store status relationship](/media/tikv-store-status-relationship.png)

- Information reported by Region leaders:

Each Region leader sends heartbeats to PD periodically to report [`RegionState`](https://github.com/pingcap/kvproto/blob/master/proto/pdpb.proto#L312), including:
Expand Down