Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qps drop to zero and also influence the lag of pitr or cdc during injection pd leader io hang or io delay 500ms or 1s due to transfer etcd leader failed #8204

Open
Lily2025 opened this issue May 21, 2024 · 4 comments
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@Lily2025
Copy link

Lily2025 commented May 21, 2024

Bug Report

What did you do?

1、run tpcc
2、inject pd leader io hang last for 5m

What did you expect to see?

qps can recover within 2mins

What did you see instead?

qps drop to zero during injection pd leader io hang
img_v3_02b3_fcc533fe-6dec-40c5-8a12-cc9601a3434g

img_v3_02b3_ed19d944-62f5-4cf9-95b4-412749df9c6g

io delay 500ms
image

image

What version of PD are you using (pd-server -V)?

./pd-server -V
Release Version: v8.1.0
Edition: Community
Git Commit Hash: fca469c
Git Branch: HEAD
UTC Build Time: 2024-05-09 02:15:45
2024-05-17T01:42:48.440+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label May 21, 2024
@Lily2025
Copy link
Author

/assign JmPotato
/type enhancement

@ti-chi-bot ti-chi-bot bot added the type/enhancement The issue or PR belongs to an enhancement. label May 21, 2024
@JmPotato JmPotato removed the type/bug The issue is confirmed as a bug. label May 21, 2024
@JmPotato
Copy link
Member

According to the logs, after detecting repeated elections, the PD leader will attempt to resign as etcd leader. However, due to the impact of IO hang on the internal state machine of etcd, this operation cannot be performed. As a result, the current PD leader cannot be properly elected and it is also impossible to transfer the etcd leader to other healthy nodes. The current workaround is to directly kill the unhealthy node for forced re-election.

@Lily2025 Lily2025 changed the title qps drop to zero during injection pd leader io hang due to transfer etcd leader failed qps drop to zero during injection pd leader io hang or io delay 500ms due to transfer etcd leader failed Jun 12, 2024
@Lily2025 Lily2025 changed the title qps drop to zero during injection pd leader io hang or io delay 500ms due to transfer etcd leader failed qps drop to zero during injection pd leader io hang or io delay 500ms due to transfer etcd leader failed and also influence the lag of pitr or cdc Aug 6, 2024
@Lily2025 Lily2025 changed the title qps drop to zero during injection pd leader io hang or io delay 500ms due to transfer etcd leader failed and also influence the lag of pitr or cdc qps drop to zero during injection pd leader io hang or io delay 500ms or 1s due to transfer etcd leader failed and also influence the lag of pitr or cdc Aug 6, 2024
@Lily2025
Copy link
Author

Lily2025 commented Aug 6, 2024

img_v3_02dg_d6051a61-7835-46f8-84bf-ceeb9199ecfg

@Lily2025
Copy link
Author

Lily2025 commented Aug 6, 2024

image

@Lily2025 Lily2025 changed the title qps drop to zero during injection pd leader io hang or io delay 500ms or 1s due to transfer etcd leader failed and also influence the lag of pitr or cdc qps drop to zero and also influence the lag of pitr or cdc during injection pd leader io hang or io delay 500ms or 1s due to transfer etcd leader failed Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
Status: Need Triage
Development

No branches or pull requests

2 participants